CN112908421A - Tumor neogenesis antigen prediction method, device, equipment and medium - Google Patents
Tumor neogenesis antigen prediction method, device, equipment and medium Download PDFInfo
- Publication number
- CN112908421A CN112908421A CN202110303245.1A CN202110303245A CN112908421A CN 112908421 A CN112908421 A CN 112908421A CN 202110303245 A CN202110303245 A CN 202110303245A CN 112908421 A CN112908421 A CN 112908421A
- Authority
- CN
- China
- Prior art keywords
- sequence
- tumor
- peptide fragment
- affinity
- affinity score
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 206010028980 Neoplasm Diseases 0.000 title claims abstract description 130
- 239000000427 antigen Substances 0.000 title claims abstract description 123
- 102000036639 antigens Human genes 0.000 title claims abstract description 123
- 108091007433 antigens Proteins 0.000 title claims abstract description 123
- 238000000034 method Methods 0.000 title claims abstract description 50
- 230000009707 neogenesis Effects 0.000 title claims description 19
- 102000007079 Peptide Fragments Human genes 0.000 claims abstract description 82
- 108010033276 Peptide Fragments Proteins 0.000 claims abstract description 82
- 108090000765 processed proteins & peptides Proteins 0.000 claims abstract description 55
- 210000000265 leukocyte Anatomy 0.000 claims description 64
- 230000007246 mechanism Effects 0.000 claims description 22
- 238000000605 extraction Methods 0.000 claims description 11
- 238000012545 processing Methods 0.000 claims description 10
- 238000012216 screening Methods 0.000 claims description 10
- 238000004590 computer program Methods 0.000 claims description 2
- 230000000694 effects Effects 0.000 abstract description 5
- 239000011159 matrix material Substances 0.000 description 12
- 238000012549 training Methods 0.000 description 10
- 230000006870 function Effects 0.000 description 7
- 102000004196 processed proteins & peptides Human genes 0.000 description 7
- 238000012795 verification Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 238000013528 artificial neural network Methods 0.000 description 5
- 238000010801 machine learning Methods 0.000 description 5
- 238000004880 explosion Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 238000013136 deep learning model Methods 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 210000004881 tumor cell Anatomy 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 2
- 229920001184 polypeptide Polymers 0.000 description 2
- 239000000126 substance Substances 0.000 description 2
- 102000003886 Glycoproteins Human genes 0.000 description 1
- 108090000288 Glycoproteins Proteins 0.000 description 1
- 102000008300 Mutant Proteins Human genes 0.000 description 1
- 108010021466 Mutant Proteins Proteins 0.000 description 1
- 210000001744 T-lymphocyte Anatomy 0.000 description 1
- 230000000961 alloantigen Effects 0.000 description 1
- 150000001413 amino acids Chemical class 0.000 description 1
- 210000000612 antigen-presenting cell Anatomy 0.000 description 1
- 230000006907 apoptotic process Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000009833 condensation Methods 0.000 description 1
- 230000005494 condensation Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 239000013078 crystal Substances 0.000 description 1
- 210000000805 cytoplasm Anatomy 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 230000018044 dehydration Effects 0.000 description 1
- 238000006297 dehydration reaction Methods 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 230000002209 hydrophobic effect Effects 0.000 description 1
- 230000028993 immune response Effects 0.000 description 1
- 210000000987 immune system Anatomy 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 239000012528 membrane Substances 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 230000035772 mutation Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000007637 random forest analysis Methods 0.000 description 1
- 230000008707 rearrangement Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/01—Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- Biophysics (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Medical Informatics (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Molecular Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Analytical Chemistry (AREA)
- Bioethics (AREA)
- Chemical & Material Sciences (AREA)
- Databases & Information Systems (AREA)
- Epidemiology (AREA)
- Public Health (AREA)
- Investigating Or Analysing Biological Materials (AREA)
- Peptides Or Proteins (AREA)
Abstract
The embodiment of the invention discloses a method, a device, equipment and a medium for predicting tumor neoantigen. The method comprises the following steps: obtaining a leucocyte antigen sequence and a peptide fragment sequence of a tumor patient; inputting the leucocyte antigen sequence and the peptide segment sequence into a trained affinity score prediction model to obtain affinity scores of the leucocyte antigen sequence and the peptide segment sequence; and determining whether the peptide fragment sequence is a tumor neoantigen according to the affinity score. According to the technical scheme of the embodiment of the invention, the characteristics can be automatically extracted through the affinity score prediction model, and deeper characteristics can be extracted, so that the accuracy of the affinity score is improved; the method solves the problem of low accuracy and efficiency of determining the tumor neoantigen, and achieves the effects of improving the accuracy and efficiency of determining the tumor neoantigen and reducing the labor cost.
Description
Technical Field
The embodiment of the invention relates to the technical field of biological information, in particular to a method, a device, equipment and a medium for predicting a tumor neogenesis antigen.
Background
The tumor neoantigen is a 'non-self' neoprotein polypeptide which is recognized by a human antigen presenting cell and originally does not exist in a human body, the 'non-self' neoprotein polypeptide is mainly formed by the apoptosis of mutant protein formed by the mutation of tumor cells, and the tumor neoantigen is a key factor for exciting the initial immune response of an organism immune system to the tumor cells.
At present, the prediction methods for tumor neoantigens are mainly classified into three types: a first class of structure-based methods; the second type predicts the affinity value of the peptide fragment and the neoantigen based on a scoring matrix of a specific position; the third is based on machine learning methods.
The structure-based method is to calculate the minimum free energy of the peptide fragment-HLA complex, but because the number of crystal structures is limited, the prediction speed is very slow and inaccurate; the affinity value of the peptide fragment and the neoantigen is predicted based on the scoring matrix of the specific position, the linear computation complexity of the method is much lower than the nonlinear computation complexity of the structure-based method and the machine learning-based method, but the characteristics need to be set for similar motifs, the scoring function of the specific position needs to be constructed, expert experience needs to be blended, the process is complex and tedious, and the accuracy is low; predicting based on a machine learning method, predicting the AUC average value of a large number of HLA types through models such as a support vector machine, a hidden Markov model, a simple neural network and the like, providing a good prediction tool, but considering the contribution of each residue on each position in the peptide, constructing a quantitative matrix, and inputting a machine learning model, wherein the contribution score of the residue needs to be continuously and repeatedly considered in the process, high professional knowledge and experience are required, high-level features cannot be automatically extracted, the prediction precision is not accurate enough for a few HLA types appearing in most people, and the methods are low in efficiency in predicting a large number of peptides generated from whole genome and transcriptome sequencing data due to the nonlinearity of the high-level features; performance is improved based on the combined approach, but performance is still unsatisfactory.
Disclosure of Invention
The embodiment of the invention provides a tumor neoantigen prediction method, a tumor neoantigen prediction device, tumor neoantigen prediction equipment and a tumor neoantigen prediction medium, so that the effects of improving the accuracy of determining tumor neoantigens and reducing the labor cost are achieved.
In a first aspect, the embodiments of the present invention provide a method for predicting tumor neoantigen, the method including:
obtaining a leucocyte antigen sequence and a peptide fragment sequence of a tumor patient;
inputting the leucocyte antigen sequence and the peptide segment sequence into a trained affinity score prediction model to obtain affinity scores of the leucocyte antigen sequence and the peptide segment sequence;
and determining whether the peptide fragment sequence is a tumor neoantigen according to the affinity score.
In a second aspect, the embodiments of the present invention also provide a device for predicting tumor neoantigen, the device including:
the sequence acquisition module is used for acquiring a leukocyte antigen sequence and a peptide fragment sequence of a tumor patient;
the affinity score acquisition module is used for inputting the leukocyte antigen sequence and the peptide segment sequence into a trained prediction model to obtain the affinity scores of the leukocyte antigen sequence and the peptide segment sequence;
and the tumor neogenesis antigen determination module is used for determining whether the peptide segment sequence is the tumor neogenesis antigen according to the affinity score.
In a third aspect, an embodiment of the present invention further provides a tumor neoantigen prediction apparatus, where the tumor neoantigen prediction apparatus includes:
one or more processors;
storage means for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement a method for tumor neoantigen prediction as provided by any of the embodiments of the invention.
In a fourth aspect, the embodiments of the present invention further provide a computer-readable storage medium, on which a computer program is stored, where the program, when executed by a processor, implements the method for predicting tumor neoantigen provided in any embodiment of the present invention.
According to the technical scheme of the embodiment of the invention, a leukocyte antigen sequence and a peptide fragment sequence of a tumor patient are obtained; inputting the leucocyte antigen sequence and the peptide segment sequence into a trained affinity score prediction model to obtain affinity scores of the leucocyte antigen sequence and the peptide segment sequence; affinity scores of a leukocyte antigen sequence and a peptide fragment sequence can be automatically determined, characteristics can be automatically extracted through an affinity score prediction model without setting characteristics based on similar motifs or setting scores for contribution of each motif, and deeper characteristics can be extracted, so that the accuracy of the affinity scores is improved; and determining whether the peptide fragment sequence is the tumor neoantigen according to the affinity value, so that the problems of low accuracy and efficiency in determining the tumor neoantigen are solved, the accuracy and efficiency in determining the tumor neoantigen are improved, and the labor cost is reduced.
Drawings
FIG. 1 is a flow chart of a method for predicting tumor neoantigen in one embodiment of the present invention;
fig. 2 is a schematic diagram of comparison between prediction accuracy and loss value of a training set and a verification set of a tumor neogenesis antigen deep learning deep tna model according to a first embodiment of the present invention;
FIG. 3 is a schematic diagram of a tumor neoantigen DeepTNA model in accordance with a first embodiment of the present invention;
FIG. 4 is a flowchart of a method for predicting tumor neoantigen in the second embodiment of the present invention;
FIG. 5 is a comparison of the accuracy of the prediction of tumor neoantigen compared to the existing prediction models in example two of the present invention;
FIG. 6 is a schematic diagram of a tumor neoantigen prediction apparatus according to a third embodiment of the present invention;
fig. 7 is a schematic structural diagram of a tumor neoantigen prediction apparatus according to a fourth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Example one
Fig. 1 is a flowchart of a method for predicting tumor neoantigen according to an embodiment of the present invention, which can be implemented by a device for predicting tumor neoantigen, and includes the following steps:
s110, obtaining a leukocyte antigen sequence and a peptide fragment sequence of a tumor patient.
At present, the tumor neoantigen can be predicted by obtaining the affinity of the leucocyte antigen sequence and the peptide fragment sequence of the tumor patient. Human leukocyte antigens are highly polymorphic alloantigens, whose chemical nature is a class of glycoproteins, formed by the non-covalent association of an alpha heavy chain (glycosylated) and a beta light chain. The amino-terminal part of the peptide chain faces outwards (about 3/4% of the whole molecule), the carboxy-terminal part penetrates into the cytoplasm, and the central hydrophobic part is in the membrane. The peptide segment is a chain substance obtained by dehydration and condensation of amino acid. Optionally, the leukocyte antigen sequence and the peptide fragment sequence of the tumor patient are obtained through an Immune Epitope Database (IEDB) or related documents, so as to obtain affinity scores of the leukocyte antigen sequence and the peptide fragment sequence based on the leukocyte antigen sequence and the peptide fragment sequence, and whether the peptide fragment sequence is a tumor neoantigen is determined according to the affinity scores.
S120, inputting the leukocyte antigen sequence and the peptide segment sequence into a trained affinity score prediction model to obtain the affinity scores of the leukocyte antigen sequence and the peptide segment sequence.
Inputting the obtained leukocyte antigen sequence and the peptide segment sequence into a trained affinity score prediction model, and directly obtaining the affinity scores of the leukocyte antigen sequence and the peptide segment sequence. The method does not need to set characteristics based on similar motifs or set scores for the contribution of each motif, realizes the prediction of the affinity score of the leukocyte antigen sequence and the peptide fragment sequence by a pre-designed score calculation function based on the characteristics or inputting a scoring characteristic matrix into a traditional machine learning model, such as a random forest model, a decision tree model and the like, can reduce the labor cost, greatly improves the accuracy compared with the prediction accuracy obtained by directly splicing the leukocyte antigen sequence and the peptide fragment sequence and inputting the leukocyte antigen sequence and the peptide fragment sequence into a simple neural network or an RNN model, does not need to process the obtained leukocyte antigen sequence and the peptide fragment sequence into a regular two-dimensional matrix, and inputs the leukocyte antigen sequence and the peptide fragment sequence which are converted into the two-dimensional matrix into a trained deep learning network, such as a convolutional neural network and the like, and affinity values of the leukocyte antigen sequence and the peptide fragment sequence are obtained, and the complexity of data processing is reduced. As shown in fig. 2, inputting sample data of a training set to an affinity prediction model to be trained to obtain an accuracy of an affinity score and a loss value of the training set; inputting sample data of the verification set into the trained affinity prediction model to obtain the accuracy of the affinity value and the loss value of the verification set; as can be seen from fig. 2, the verification set loss value and the training set loss value are both very small, the training set loss is 0.26, the verification set loss is only 0.34, the verification set accuracy rate is very close to the training set accuracy rate, the training set accuracy rate is 89%, and the verification set accuracy rate also reaches 87%, so that the deep learning model deep tna model has no over-fitting or under-fitting condition and is very high in model accuracy rate, and the obtained affinity prediction model deep tna can accurately predict the affinity value.
Alternatively, as shown in FIG. 3, the affinity score prediction model includes an encoder and a decoder; inputting the leukocyte antigen sequence and the peptide fragment sequence into a trained affinity score prediction model to obtain affinity scores of the leukocyte antigen sequence and the peptide fragment sequence, wherein the affinity scores comprise: coding the leucocyte antigen sequence and the peptide fragment sequence through the coder to obtain sequence codes; determining, by the decoder, the affinity score from the sequence encoding. To reduce labor costs, it is not necessary to set features based on similar motifs or to set scores for the contribution of each motif, reducing the cost of adding experts and the cost of manually setting or labeling features. Meanwhile, the peptide fragments are prevented from being processed into a regular two-dimensional matrix, and the complexity of data processing is reduced. And inputting the leukocyte antigen sequence and the peptide fragment sequence into a trained deep learning model based on an encoder-decoder to obtain affinity scores of the leukocyte antigen sequence and the peptide fragment sequence. Coding the leucocyte antigen sequence and the peptide segment sequence by an encoder in an affinity score prediction model, and converting the leucocyte antigen sequence and the peptide segment sequence into codes which can be identified by a computer; inputting the coded leucocyte antigen sequence and peptide segment sequence into a decoder part in an affinity score prediction model, performing affinity score prediction according to the coded leucocyte antigen sequence and peptide segment sequence through a decoder in the affinity score prediction model to obtain the affinity score of the leucocyte antigen sequence and the peptide segment sequence, and compared with the method of setting characteristics based on similar motifs or setting scores for contribution of each motif to construct a quantitative characteristic matrix, calculating the affinity score according to the set characteristics, and improving the efficiency of the affinity score prediction. Further, when training the affinity score prediction model, the length range of the peptide fragment of the training sample is increased, and optionally, the length range may include: 8-15 mer. The prediction peptide section range of the trained affinity score prediction model is wider.
Optionally, the encoding the leukocyte antigen sequence and the peptide sequence by the encoder to obtain a sequence code, including: splicing the leukocyte antigen sequence and the peptide fragment sequence by the encoder to obtain a spliced sequence; and coding the spliced sequence through the coder to obtain a sequence code. As shown in fig. 3, before the leukocyte antigen sequence and the peptide fragment sequence are encoded by the encoder part in the affinity score prediction model, the sequence features of the extracted leukocyte antigen sequence and peptide fragment sequence are input to the integration layer, and are subjected to splicing processing to construct a long vector of the sequence, and the spliced sequence is encoded. After the spliced sequence is coded, the leukocyte antigen sequence and the peptide fragment sequence can be converted into codes which can be identified by a computer, so that a subsequent decoder can conveniently perform decoding operation.
Optionally, the decoder is constructed based on a gating mechanism and an attention mechanism. In the neural network, the learning rate of the current hidden layer is lower than that of the subsequent hidden layer, that is, the classification accuracy is reduced as the number of hidden layers is increased. This phenomenon is called the vanishing gradient problem. In a neural network, gradient attenuation is caused by continuous multiplication, if a very large value appears in the continuous multiplication, the finally calculated gradient is very large, a very large gradient value is obtained when optimizing to a cliff, if updating is carried out by the gradient value, the step size of the iteration is very large, and a reasonable area can be flown out at a moment, and the phenomenon is called gradient explosion. The problem of gradient explosion and gradient disappearance is improved by a gating mechanism, for example, a network with a multiplicative gate structure, such as GRU and LSTM, can be used, and the GRU network is a better variant of the LSTM network, is simpler in structure and better in effect than the LSTM network, so that the GRU network is used as a preferred network to improve the problem of gradient explosion and gradient explosion.
The attention mechanism mimics the internal process of biological observation behavior, i.e., a mechanism that aligns internal experience with external perception to increase the fineness of observation of a partial region. The attention mechanism can enable the neural network to have the ability of focusing on the input (or characteristic) subset, so that important characteristics of sparse data can be extracted quickly, dimension reduction can be performed on the input data through the attention mechanism, the prediction result output by the prediction model can focus more on the key part screened through the attention mechanism, and the efficiency and accuracy of the neural network model for processing data are improved.
Optionally, determining, by the decoder, the affinity score according to the sequence encoding comprises: extracting, by the decoder, a target sequence feature from the sequence code; determining the affinity score according to the target sequence characteristics. The coded leucocyte antigen sequence and peptide fragment sequence are input into a decoder part in an affinity score prediction model, the coded leucocyte antigen sequence and peptide fragment sequence are subjected to feature extraction through the decoder in the affinity score prediction model, automation of feature extraction is achieved, and compared with a method for setting features based on similar motifs or setting a fraction construction feature matrix for contribution of each motif, feature extraction is performed through the decoder, so that the features of the extracted leucocyte antigen sequence and peptide fragment sequence contain deeper and higher meanings, and accuracy of efficiency of affinity score prediction is improved.
Optionally, extracting, by the decoder, the target sequence feature according to the sequence coding includes: extracting, by the decoder, sequence features based on the gating mechanism for the sequence encoding; screening the sequence features by the decoder based on the attention mechanism to obtain screened sequence features; and performing feature extraction processing on the screened sequence features through the decoder based on the gating mechanism to obtain target sequence features. The method comprises the steps of extracting the characteristics of the codes of the leukocyte antigen sequence and the peptide segment sequence through a decoder, screening the extracted sequence characteristics based on an attention mechanism after extracting the sequence characteristics, thereby extracting the characteristics which have great influence on the affinity scores of the leukocyte antigen sequence and the peptide segment sequence and improving the efficiency of characteristic extraction. And further extracting and processing the screened features, and further mining and analyzing the features with larger influence to obtain the target sequence features. And obtaining the affinity scores of the leukocyte antigen sequence and the peptide segment sequence according to the characteristics of the target sequence. Illustratively, the target sequence features are input into a sigmoid function, and the sigmoid function can map the target sequence features to [0,1], so as to obtain affinity scores of the leukocyte antigen sequence and the peptide fragment sequence in the range of [0,1 ]. Affinity scores of the leukocyte antigen sequence and the peptide segment sequence are obtained through the target training characteristics, and the efficiency and accuracy of the affinity scores are improved. The affinity values of the leukocyte antigen sequence and the peptide fragment sequence are predicted through the deep learning model of the encoder-decoder structure, high-grade characteristics of the leukocyte antigen sequence and the peptide fragment sequence can be automatically extracted, meanwhile, the affinity prediction model can be updated and deployed more easily, the acquired leukocyte antigen sequence and the acquired peptide fragment sequence can be input into the affinity prediction model to obtain the affinity values, data preprocessing on the acquired leukocyte antigen sequence and the acquired peptide fragment sequence is not needed, the complexity of data processing is reduced, manual intervention is not needed, and the efficiency and the accuracy of acquiring the affinity values are improved.
S130, determining whether the peptide fragment sequence is a tumor neoantigen according to the affinity score.
Tumor-specific antigens (TSAs) are antigens recognized by T cells, and genomic variants from tumors are expressed as tumor-specific peptides (neo-epitopes) and defined as neoantigens (neoantigens). Unlike tumor associated antigens, which are present only in tumor cells, high quality tumor neoantigens are generally mutant peptides having a higher affinity for leukocyte antigens than normal peptides, and thus whether a peptide fragment is a tumor neoantigen is determined by the affinity scores of the leukocyte antigen sequence and the peptide fragment sequence.
According to the technical scheme of the embodiment of the invention, a leukocyte antigen sequence and a peptide fragment sequence of a tumor patient are obtained; inputting the leucocyte antigen sequence and the peptide segment sequence into a trained affinity score prediction model to obtain affinity scores of the leucocyte antigen sequence and the peptide segment sequence; affinity scores of a leukocyte antigen sequence and a peptide fragment sequence can be automatically determined, a metering feature matrix is constructed based on similar motif setting features or score setting contributions to each motif, features can be automatically extracted through an affinity score prediction model, deeper features can be extracted, and therefore the accuracy of the affinity scores is improved; and determining whether the peptide fragment sequence is the tumor neoantigen according to the affinity value, so that the problems of low accuracy and efficiency in determining the tumor neoantigen are solved, the accuracy and efficiency in determining the tumor neoantigen are improved, and the software implementation complexity and labor cost are reduced.
Example two
Fig. 4 is a flowchart of a method for predicting a tumor neoantigen according to the second embodiment of the present invention, which is a further refinement of the first embodiment, and determines whether the peptide fragment sequence is a tumor neoantigen according to an affinity score, including: and when the affinity score reaches a preset threshold value, determining the peptide fragment sequence as a tumor neoantigen. In the prior art, tumor neoantigen is generally predicted from expression data of tumors, and false positive can be caused in screening of the tumor neoantigen due to lack of comparison of normal peptide fragments and affinity of mutant peptides and leukocyte antigens. Whether the peptide fragment sequence is the tumor neoantigen or not is determined through the affinity score, so that the accuracy of prediction of the tumor neoantigen can be improved.
As shown in fig. 4, the method specifically includes the following steps:
s210, obtaining a leukocyte antigen sequence and a peptide fragment sequence of the tumor patient.
S220, inputting the leukocyte antigen sequence and the peptide segment sequence into a trained affinity score prediction model to obtain the affinity scores of the leukocyte antigen sequence and the peptide segment sequence.
S230, when the affinity score reaches a preset threshold value, determining the peptide fragment sequence as a tumor neoantigen.
And determining whether the obtained peptide fragment sequence is the tumor neoantigen or not according to the affinity scores of the leukocyte antigen sequence and the peptide fragment sequence obtained by the affinity score prediction model. The tumor neoantigen is generally a mutant peptide having a higher affinity for leukocyte antigens than the normal peptide. Optionally, an affinity score threshold is preset, and when the affinity score of the current peptide fragment sequence and the leukocyte antigen is higher than the preset affinity score threshold, the current peptide fragment sequence is determined to be the tumor neoantigen; and when the affinity score of the current peptide fragment sequence and the leukocyte antigen is lower than a preset affinity score threshold value, determining that the current peptide fragment sequence is not the tumor neoantigen. Compared with the prior art which predicts the tumor neoantigen from the expression data of the tumor, the screening of the tumor neoantigen has false positive. Whether the peptide fragment sequence is the tumor neoantigen or not is determined through the affinity value, the accuracy of prediction of the tumor neoantigen can be improved, and the phenomenon that false positive exists in screening of the tumor neoantigen is avoided. Taking the existing Average Relative Binding (ARB) model with high accuracy as an example, as shown in fig. 5, the accuracy of each neogenetic antigen type in the test set predicted by the ARB model and the accuracy of each neogenetic antigen type in the test set predicted by the tumor neogenetic antigen prediction model provided in this embodiment, i.e., the deepna model, are shown. As can be seen from fig. 5, the accuracy of most tumor neoantigen types in the test set of the tumor neoantigen prediction model DeepTNA prediction provided by this embodiment is higher than that of the existing model ARB, and the accuracy of only two tumor neoantigen types prediction is slightly lower than that of the ARB model, so the method provided by this embodiment has a good effect on predicting tumor neoantigens.
According to the technical scheme of the embodiment of the invention, a leukocyte antigen sequence and a peptide fragment sequence of a tumor patient are obtained; inputting the leucocyte antigen sequence and the peptide segment sequence into a trained affinity score prediction model to obtain affinity scores of the leucocyte antigen sequence and the peptide segment sequence; affinity scores of a leukocyte antigen sequence and a peptide fragment sequence can be automatically determined, a metering feature matrix is not required to be constructed based on similar motif setting features or score setting contributions to each motif, features can be automatically extracted through an affinity score prediction model, deeper features can be extracted, and therefore the accuracy of the affinity scores is improved; when the affinity score reaches a preset threshold value, the peptide segment sequence is determined to be the tumor neoantigen, so that the accuracy of tumor neoantigen prediction is improved, the phenomenon of false positive in screening of the tumor neoantigen is reduced, the problems of low accuracy and efficiency in determining the tumor neoantigen are solved, the accuracy and efficiency in determining the tumor neoantigen are improved, and the effect of reducing the labor cost is achieved.
EXAMPLE III
Fig. 6 is a structural diagram of a tumor neoantigen prediction apparatus according to a third embodiment of the present invention, the tumor neoantigen prediction apparatus including: a sequence acquisition module 310, an affinity score acquisition module 320, and a tumor neoantigen determination module 330.
The sequence acquisition module 310 is used for acquiring a leukocyte antigen sequence and a peptide fragment sequence of a tumor patient; an affinity score obtaining module 320, configured to input the leukocyte antigen sequence and the peptide fragment sequence into a trained prediction model, so as to obtain affinity scores of the leukocyte antigen sequence and the peptide fragment sequence; and the tumor neogenesis antigen determining module 330 is configured to determine whether the peptide fragment sequence is a tumor neogenesis antigen according to the affinity score.
Optionally, the affinity score prediction model comprises an encoder and a decoder;
in the technical solution of the above embodiment, the affinity score obtaining module 320 includes:
a sequence code generating unit, which is used for coding the leucocyte antigen sequence and the peptide segment sequence through the coder to obtain a sequence code;
an affinity score determination unit for determining, by the decoder, the affinity score from the sequence code.
In the technical solution of the above embodiment, the sequence coding generating unit includes:
the splicing sequence generating subunit is used for splicing the leucocyte antigen sequence and the peptide fragment sequence through the encoder to obtain a splicing sequence;
and the sequence coding generation subunit is used for coding the spliced sequence through the coder to obtain a sequence code.
Optionally, the decoder is constructed based on a gating mechanism and an attention mechanism.
In the technical solution of the above embodiment, the affinity score determining unit includes:
a target sequence feature extraction subunit, configured to extract, by the decoder, a target sequence feature according to the sequence coding;
and the affinity score determining subunit is used for determining the affinity score according to the target sequence characteristics.
In the technical solution of the above embodiment, the target sequence feature extraction subunit includes:
a sequence feature extraction subunit for extracting, by the decoder, a sequence feature based on the gating mechanism for the sequence encoding;
a screening sequence feature obtaining subunit, configured to screen, by the decoder, the sequence feature based on the attention mechanism to obtain a screening sequence feature;
and the characteristic extraction subunit is used for performing characteristic extraction processing on the screened sequence characteristics through the decoder based on the gating mechanism to obtain target sequence characteristics.
In the technical solution of the above embodiment, the module 330 for determining a tumor neogenesis antigen is specifically configured to determine that the peptide sequence is a tumor neogenesis antigen when the affinity score reaches a preset threshold.
According to the technical scheme of the embodiment of the invention, a leukocyte antigen sequence and a peptide fragment sequence of a tumor patient are obtained; inputting the leucocyte antigen sequence and the peptide segment sequence into a trained affinity score prediction model to obtain affinity scores of the leucocyte antigen sequence and the peptide segment sequence; affinity scores of a leukocyte antigen sequence and a peptide fragment sequence can be automatically determined, a metering feature matrix is not required to be constructed based on similar motif setting features or score setting contributions to each motif, features can be automatically extracted through an affinity score prediction model, deeper features can be extracted, and therefore the accuracy of the affinity scores is improved; whether the peptide fragment sequence is the tumor neoantigen or not is determined according to the affinity value, the phenomenon of false positive in the screening of the tumor neoantigen is reduced, the problems of low accuracy and efficiency in determining the tumor neoantigen are solved, the accuracy and efficiency in determining the tumor neoantigen are improved, and the complexity of a program and the labor cost are reduced.
The tumor neogenesis antigen prediction device provided by the embodiment of the invention can execute the tumor neogenesis antigen prediction method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.
Example four
Fig. 7 is a schematic structural diagram of a tumor neoantigen prediction apparatus according to a fourth embodiment of the present invention, as shown in fig. 7, the tumor neoantigen prediction apparatus includes a processor 410, a memory 420, an input device 430, and an output device 440; the number of the processors 410 in the tumor neogenesis antigen prediction device may be one or more, and one processor 410 is taken as an example in fig. 7; the processor 410, the memory 420, the input device 430 and the output device 440 of the tumor neogenesis antigen prediction apparatus may be connected by a bus or other means, and fig. 7 illustrates the connection by the bus as an example.
The memory 420 serves as a computer-readable storage medium for storing software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the tumor neoantigen prediction method in the embodiment of the present invention (for example, the sequence acquisition module 310, the affinity score acquisition module 320, and the tumor neoantigen determination module 330 in the tumor neoantigen prediction apparatus). The processor 410 executes various functional applications and data processing of the tumor neoantigen prediction apparatus by executing software programs, instructions and modules stored in the memory 420, so as to implement the above-mentioned tumor neoantigen prediction method.
The memory 420 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal, and the like. Further, the memory 420 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, memory 420 may further include memory located remotely from processor 410, which may be connected to the tumor neogenesis antigen prediction device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input means 430 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the tumor neogenesis antigen prediction apparatus. The output device 440 may include a display device such as a display screen.
EXAMPLE five
Embodiments of the present invention also provide a storage medium containing computer-executable instructions which, when executed by a computer processor, perform a method for tumor neoantigen prediction, the method comprising:
obtaining a leucocyte antigen sequence and a peptide fragment sequence of a tumor patient;
inputting the leucocyte antigen sequence and the peptide segment sequence into a trained affinity score prediction model to obtain affinity scores of the leucocyte antigen sequence and the peptide segment sequence;
and determining whether the peptide fragment sequence is a tumor neoantigen according to the affinity score.
Of course, the embodiments of the present invention provide a storage medium containing computer-executable instructions, which are not limited to the operations of the method described above, and can also perform related operations in the method for predicting tumor neogenesis antigen provided by any embodiments of the present invention.
From the above description of the embodiments, it is obvious for those skilled in the art that the present invention can be implemented by software and necessary general hardware, and certainly, can also be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the methods according to the embodiments of the present invention.
It should be noted that, in the embodiment of the tumor neoantigen prediction apparatus, the included units and modules are merely divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be realized; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.
Claims (10)
1. A method for predicting a neoantigen of a tumor, comprising:
obtaining a leucocyte antigen sequence and a peptide fragment sequence of a tumor patient;
inputting the leucocyte antigen sequence and the peptide segment sequence into a trained affinity score prediction model to obtain affinity scores of the leucocyte antigen sequence and the peptide segment sequence;
and determining whether the peptide fragment sequence is a tumor neoantigen according to the affinity score.
2. The method of claim 1, wherein the affinity score prediction model comprises an encoder and a decoder;
inputting the leukocyte antigen sequence and the peptide fragment sequence into a trained affinity score prediction model to obtain affinity scores of the leukocyte antigen sequence and the peptide fragment sequence, wherein the affinity scores comprise:
coding the leucocyte antigen sequence and the peptide fragment sequence through the coder to obtain sequence codes;
determining, by the decoder, the affinity score from the sequence encoding.
3. The method of claim 2, wherein said encoding said leukocyte antigen sequence and said peptide fragment sequence by said encoder results in a sequence encoding comprising:
splicing the leukocyte antigen sequence and the peptide fragment sequence by the encoder to obtain a spliced sequence;
and coding the spliced sequence through the coder to obtain a sequence code.
4. The method of claim 2, wherein the decoder is constructed based on a gating mechanism and an attention mechanism.
5. The method of claim 4, wherein said determining, by said decoder, said affinity score from said sequence encoding comprises:
extracting, by the decoder, a target sequence feature from the sequence code;
determining the affinity score according to the target sequence characteristics.
6. The method of claim 5, wherein said extracting, by the decoder, the target sequence feature from the sequence code comprises:
extracting, by the decoder, sequence features based on the gating mechanism for the sequence encoding;
screening the sequence features by the decoder based on the attention mechanism to obtain screened sequence features;
and performing feature extraction processing on the screened sequence features through the decoder based on the gating mechanism to obtain target sequence features.
7. The method of claim 1, wherein determining whether the peptide fragment sequence is a tumor neoantigen based on the affinity score comprises:
and when the affinity score reaches a preset threshold value, determining the peptide fragment sequence as a tumor neoantigen.
8. A tumor neoantigen prediction device, comprising:
the sequence acquisition module is used for acquiring a leukocyte antigen sequence and a peptide fragment sequence of a tumor patient;
the affinity score acquisition module is used for inputting the leukocyte antigen sequence and the peptide segment sequence into a trained prediction model to obtain the affinity scores of the leukocyte antigen sequence and the peptide segment sequence;
and the tumor neogenesis antigen determination module is used for determining whether the peptide segment sequence is the tumor neogenesis antigen according to the affinity score.
9. A tumor neoantigen determination device, characterized by comprising:
one or more processors;
storage means for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement the method of tumor neoantigen prediction according to any one of claims 1-7.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out a method for predicting a neogenetic tumor antigen according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110303245.1A CN112908421B (en) | 2021-03-22 | 2021-03-22 | Tumor neogenesis antigen prediction method, device, equipment and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110303245.1A CN112908421B (en) | 2021-03-22 | 2021-03-22 | Tumor neogenesis antigen prediction method, device, equipment and medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112908421A true CN112908421A (en) | 2021-06-04 |
CN112908421B CN112908421B (en) | 2024-02-06 |
Family
ID=76105914
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110303245.1A Active CN112908421B (en) | 2021-03-22 | 2021-03-22 | Tumor neogenesis antigen prediction method, device, equipment and medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112908421B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114561472A (en) * | 2022-04-27 | 2022-05-31 | 普瑞基准科技(北京)有限公司 | Kit for detecting or assisting in detecting tumor-related gene variation and application thereof |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180203852A1 (en) * | 2017-01-18 | 2018-07-19 | Xerox Corporation | Natural language generation through character-based recurrent neural networks with finite-state prior knowledge |
CN109671469A (en) * | 2018-12-11 | 2019-04-23 | 浙江大学 | The method for predicting marriage relation and binding affinity between polypeptide and HLA I type molecule based on Recognition with Recurrent Neural Network |
CN109706065A (en) * | 2018-12-29 | 2019-05-03 | 深圳裕策生物科技有限公司 | Tumor neogenetic antigen load detection device and storage medium |
US20200243164A1 (en) * | 2019-01-30 | 2020-07-30 | Bioinformatics Solutions Inc. | Systems and methods for patient-specific identification of neoantigens by de novo peptide sequencing for personalized immunotherapy |
KR102159921B1 (en) * | 2020-03-24 | 2020-09-25 | 주식회사 테라젠바이오 | Method for predicting neoantigen using a peptide sequence and hla allele sequence and computer program |
CN111815614A (en) * | 2020-07-17 | 2020-10-23 | 中国人民解放军军事科学院军事医学研究院 | Parasite detection method and system based on artificial intelligence and terminal equipment |
US20200365270A1 (en) * | 2019-05-15 | 2020-11-19 | International Business Machines Corporation | Drug efficacy prediction for treatment of genetic disease |
KR102184720B1 (en) * | 2019-10-11 | 2020-11-30 | 한국과학기술원 | Prediction method for binding preference between mhc and peptide on cancer cell and analysis apparatus |
US20210041454A1 (en) * | 2019-08-09 | 2021-02-11 | Immatics US, Inc. | Methods for peptide mass spectrometry fragmentation prediction |
-
2021
- 2021-03-22 CN CN202110303245.1A patent/CN112908421B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180203852A1 (en) * | 2017-01-18 | 2018-07-19 | Xerox Corporation | Natural language generation through character-based recurrent neural networks with finite-state prior knowledge |
CN109671469A (en) * | 2018-12-11 | 2019-04-23 | 浙江大学 | The method for predicting marriage relation and binding affinity between polypeptide and HLA I type molecule based on Recognition with Recurrent Neural Network |
CN109706065A (en) * | 2018-12-29 | 2019-05-03 | 深圳裕策生物科技有限公司 | Tumor neogenetic antigen load detection device and storage medium |
US20200243164A1 (en) * | 2019-01-30 | 2020-07-30 | Bioinformatics Solutions Inc. | Systems and methods for patient-specific identification of neoantigens by de novo peptide sequencing for personalized immunotherapy |
US20200365270A1 (en) * | 2019-05-15 | 2020-11-19 | International Business Machines Corporation | Drug efficacy prediction for treatment of genetic disease |
US20210041454A1 (en) * | 2019-08-09 | 2021-02-11 | Immatics US, Inc. | Methods for peptide mass spectrometry fragmentation prediction |
KR102184720B1 (en) * | 2019-10-11 | 2020-11-30 | 한국과학기술원 | Prediction method for binding preference between mhc and peptide on cancer cell and analysis apparatus |
KR102159921B1 (en) * | 2020-03-24 | 2020-09-25 | 주식회사 테라젠바이오 | Method for predicting neoantigen using a peptide sequence and hla allele sequence and computer program |
CN111815614A (en) * | 2020-07-17 | 2020-10-23 | 中国人民解放军军事科学院军事医学研究院 | Parasite detection method and system based on artificial intelligence and terminal equipment |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114561472A (en) * | 2022-04-27 | 2022-05-31 | 普瑞基准科技(北京)有限公司 | Kit for detecting or assisting in detecting tumor-related gene variation and application thereof |
Also Published As
Publication number | Publication date |
---|---|
CN112908421B (en) | 2024-02-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP7459159B2 (en) | GAN-CNN for MHC peptide binding prediction | |
CN111798921B (en) | RNA binding protein prediction method and device based on multi-scale attention convolution neural network | |
CN109671469B (en) | Method for predicting binding relationship and binding affinity between polypeptide and HLA type I molecule based on circulating neural network | |
CN111312329B (en) | Transcription factor binding site prediction method based on deep convolution automatic encoder | |
WO2020014767A1 (en) | Systems and methods for de novo peptide sequencing from data-independent acquisition using deep learning | |
JP2019535057A5 (en) | ||
CN112767997A (en) | Protein secondary structure prediction method based on multi-scale convolution attention neural network | |
CN111105843A (en) | HLA type I molecule and polypeptide affinity prediction method | |
CN112235327A (en) | Abnormal log detection method, device, equipment and computer readable storage medium | |
US11644470B2 (en) | Systems and methods for de novo peptide sequencing using deep learning and spectrum pairs | |
Yao et al. | SVMTriP: a method to predict B-cell linear antigenic epitopes | |
CN111950622B (en) | Behavior prediction method, device, terminal and storage medium based on artificial intelligence | |
CN114093415B (en) | Peptide fragment detectability prediction method and system | |
CN114446389B (en) | Tumor neoantigen feature analysis and immunogenicity prediction tool and application thereof | |
CN113762417A (en) | Method for enhancing HLA antigen presentation prediction system based on deep migration | |
Fung et al. | Automation of QIIME2 metagenomic analysis platform | |
CN115168541A (en) | Chapter event extraction method and system based on frame semantic mapping and type perception | |
CN114494168A (en) | Model determination, image recognition and industrial quality inspection method, equipment and storage medium | |
Downey et al. | alineR: An R package for optimizing feature-weighted alignments and linguistic distances | |
CN112908421A (en) | Tumor neogenesis antigen prediction method, device, equipment and medium | |
CN113838524B (en) | S-nitrosylation site prediction method, model training method and storage medium | |
JP2023530719A (en) | Machine learning techniques for predicting surface-displayed peptides | |
CN113611360A (en) | Protein-protein interaction site prediction method based on deep learning and XGboost | |
Dotan et al. | Effect of tokenization on transformers for biological sequences | |
Grinev et al. | ORFhunteR: An accurate approach to the automatic identification and annotation of open reading frames in human mRNA molecules |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |