CN112908421B - Tumor neogenesis antigen prediction method, device, equipment and medium - Google Patents

Tumor neogenesis antigen prediction method, device, equipment and medium Download PDF

Info

Publication number
CN112908421B
CN112908421B CN202110303245.1A CN202110303245A CN112908421B CN 112908421 B CN112908421 B CN 112908421B CN 202110303245 A CN202110303245 A CN 202110303245A CN 112908421 B CN112908421 B CN 112908421B
Authority
CN
China
Prior art keywords
sequence
peptide fragment
leukocyte antigen
affinity score
tumor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110303245.1A
Other languages
Chinese (zh)
Other versions
CN112908421A (en
Inventor
彭鑫鑫
米玉涛
季序我
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Precision Scientific Technology Beijing Co ltd
Predatum Biomedicine Suzhou Co ltd
Original Assignee
Precision Scientific Technology Beijing Co ltd
Predatum Biomedicine Suzhou Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Precision Scientific Technology Beijing Co ltd, Predatum Biomedicine Suzhou Co ltd filed Critical Precision Scientific Technology Beijing Co ltd
Priority to CN202110303245.1A priority Critical patent/CN112908421B/en
Publication of CN112908421A publication Critical patent/CN112908421A/en
Application granted granted Critical
Publication of CN112908421B publication Critical patent/CN112908421B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Medical Informatics (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Analytical Chemistry (AREA)
  • Bioethics (AREA)
  • Chemical & Material Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • Public Health (AREA)
  • Investigating Or Analysing Biological Materials (AREA)
  • Peptides Or Proteins (AREA)

Abstract

The embodiment of the invention discloses a tumor neoantigen prediction method, a device, equipment and a medium. The method comprises the following steps: obtaining leukocyte antigen sequences and peptide fragment sequences of tumor patients; inputting the leukocyte antigen sequence and the peptide fragment sequence into a trained affinity score prediction model to obtain affinity scores of the leukocyte antigen sequence and the peptide fragment sequence; determining whether the peptide fragment sequence is a tumor neoantigen based on the affinity score. According to the technical scheme, the characteristics can be automatically extracted through the affinity score prediction model, deeper characteristics can be extracted, and therefore accuracy of the affinity score is improved; solves the problem of low accuracy and efficiency of determining the tumor neoantigen, and realizes the effects of improving the accuracy and efficiency of determining the tumor neoantigen and reducing the labor cost.

Description

Tumor neogenesis antigen prediction method, device, equipment and medium
Technical Field
The embodiment of the invention relates to the technical field of biological information, in particular to a tumor neoantigen prediction method, a tumor neoantigen prediction device, tumor neoantigen prediction equipment and tumor neoantigen prediction media.
Background
Tumor neoantigen refers to a non-self neoprotein polypeptide which is recognized by human antigen presenting cells and does not exist in human body, the non-self neoprotein polypeptide is mainly formed by apoptosis of mutant proteins formed by mutation of tumor cells, and the tumor neoantigen is a key factor for stimulating an immune system of an organism to initiate an immune response of the tumor cells.
Currently, the methods for predicting tumor neoantigens are mainly classified into three categories: a first class of structure-based methods; the second class predicts the affinity value of peptide fragments to neoantigens based on scoring matrices at specific locations; the third is based on a machine learning method.
Structure-based methods are achieved by calculating the minimum free energy of the peptide-HLA complex, but due to the limited number of crystalline structures, the speed of prediction is very slow and inaccurate; the linear computation complexity of the method is much lower than that of a structure-based method and a machine learning-based method, but characteristics are required to be set for similar motifs, a scoring function of the specific position is required to be constructed, expert experience is required to be integrated, the process is complex and complicated, and the accuracy is low; predicting based on machine learning methods, predicting AUC averages of a large number of HLA types by support vector machines, hidden markov models, simple neural networks, etc., provides a good prediction tool, but requires constructing a quantitative matrix taking into account the contributions of each residue at each position in the peptide, then inputting a machine learning model, requiring constant repeated consideration of the contribution scores of the residues in the process, requiring very high expertise and experience, failing to automatically extract advanced features, and not being accurate enough for the few HLA types present in most populations, and because of their nonlinearity, these methods are inefficient in predicting a large number of peptides generated from whole genome and transcriptome sequencing data; the performance is improved based on the combined approach, but the performance is still not satisfactory.
Disclosure of Invention
The embodiment of the invention provides a tumor neogenesis antigen prediction method, a device, equipment and a medium, which are used for realizing the effects of improving the accuracy of determining tumor neogenesis antigens and reducing the labor cost.
In a first aspect, an embodiment of the present invention provides a tumor neoantigen prediction method, including:
obtaining leukocyte antigen sequences and peptide fragment sequences of tumor patients;
inputting the leukocyte antigen sequence and the peptide fragment sequence into a trained affinity score prediction model to obtain affinity scores of the leukocyte antigen sequence and the peptide fragment sequence;
determining whether the peptide fragment sequence is a tumor neoantigen based on the affinity score.
In a second aspect, an embodiment of the present invention further provides a tumor neoantigen predicting apparatus, including:
the sequence acquisition module is used for acquiring a leukocyte antigen sequence and a peptide fragment sequence of a tumor patient;
the affinity score acquisition module is used for inputting the leukocyte antigen sequence and the peptide fragment sequence into a trained prediction model to obtain affinity scores of the leukocyte antigen sequence and the peptide fragment sequence;
and the tumor neoantigen determination module is used for determining whether the peptide fragment sequence is the tumor neoantigen according to the affinity scores.
In a third aspect, an embodiment of the present invention further provides a tumor neoantigen predicting apparatus, wherein the tumor neoantigen predicting apparatus includes:
one or more processors;
a storage means for storing one or more programs;
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the tumor neoantigen prediction method as provided by any embodiment of the present invention.
In a fourth aspect, embodiments of the present invention also provide a computer-readable storage medium having stored thereon a computer program, wherein the program when executed by a processor implements the tumor neoantigen prediction method as provided in any of the embodiments of the present invention.
According to the technical scheme, a leukocyte antigen sequence and a peptide fragment sequence of a tumor patient are obtained; inputting the leukocyte antigen sequence and the peptide fragment sequence into a trained affinity score prediction model to obtain affinity scores of the leukocyte antigen sequence and the peptide fragment sequence; the affinity scores of the leukocyte antigen sequences and the peptide fragment sequences can be automatically determined, the characteristics are automatically extracted through an affinity score prediction model without setting characteristics based on similar motifs or setting scores for contributions of each motif, and deeper characteristics can be extracted, so that the accuracy of the affinity scores is improved; whether the peptide sequence is the tumor neoantigen or not is determined according to the affinity score, so that the problem that the accuracy and the efficiency of determining the tumor neoantigen are low is solved, and the effects of improving the accuracy and the efficiency of determining the tumor neoantigen and reducing the labor cost are achieved.
Drawings
FIG. 1 is a flow chart of a tumor neoantigen prediction method according to a first embodiment of the present invention;
FIG. 2 is a diagram showing the comparison of the prediction accuracy and the loss value of the training set and the verification set of the deep learning deep TNA model of the tumor neoantigen according to the first embodiment of the present invention;
FIG. 3 is a schematic diagram of a tumor neoantigen DeepTNA model in accordance with an embodiment of the present invention;
FIG. 4 is a flowchart of a tumor neoantigen prediction method according to a second embodiment of the present invention;
FIG. 5 is a diagram showing the comparison of the accuracy of predicting tumor neoantigens compared to the existing prediction model in the second embodiment of the present invention;
FIG. 6 is a block diagram of a tumor neoantigen predicting apparatus according to a third embodiment of the present invention;
FIG. 7 is a schematic diagram showing a structure of a tumor neoantigen predicting apparatus according to a fourth embodiment of the present invention.
Detailed Description
The invention is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting thereof. It should be further noted that, for convenience of description, only some, but not all of the structures related to the present invention are shown in the drawings.
Example 1
Fig. 1 is a flowchart of a tumor neoantigen prediction method according to an embodiment of the present invention, where the embodiment is applicable to predicting a tumor neoantigen, and the method may be performed by a tumor neoantigen prediction apparatus, and specifically includes the following steps:
s110, obtaining leukocyte antigen sequences and peptide fragment sequences of tumor patients.
Currently, tumor neoantigens can be predicted by obtaining the affinity of leukocyte antigen sequences and peptide sequences of tumor patients. Human leukocyte antigens are highly polymorphic alloantigens, the chemical nature of which is a class of glycoproteins, non-covalently bound by an alpha heavy chain (glycosylated) and a beta light chain. The amino end of the peptide chain is outwards (accounting for about 3/4 of the whole molecule), the carboxyl end penetrates into cytoplasm, and the middle hydrophobic part is in the envelope. Peptide fragments are chain substances obtained by dehydration and condensation of amino acids. Alternatively, the leukocyte antigen sequence and the peptide fragment sequence of the tumor patient are obtained through an Immune Epitope Database (IEDB) or related literature, so that affinity scores of the leukocyte antigen sequence and the peptide fragment sequence are obtained, and whether the peptide fragment sequence is a tumor neoantigen is determined according to the affinity scores.
S120, inputting the leukocyte antigen sequence and the peptide fragment sequence into a trained affinity score prediction model to obtain affinity scores of the leukocyte antigen sequence and the peptide fragment sequence.
The obtained leukocyte antigen sequence and peptide fragment sequence are input into a trained affinity score prediction model, and the affinity scores of the leukocyte antigen sequence and the peptide fragment sequence can be directly obtained. According to the technical scheme of the embodiment, labor cost can be reduced, compared with the prediction accuracy obtained by directly splicing the leukocyte antigen sequence and the peptide fragment sequence and inputting the leukocyte antigen sequence and the peptide fragment sequence into a simple neural network or an RNN model, the accuracy of the embodiment is greatly improved, the acquired leukocyte antigen sequence and peptide fragment sequence are not required to be processed into a regular two-dimensional matrix, the leukocyte antigen sequence and the peptide fragment sequence converted into the two-dimensional matrix are input into a trained deep learning network such as a convolutional neural network, so that the affinity score of the leukocyte antigen sequence and the peptide fragment sequence is acquired, and the complexity of data processing is reduced. As shown in fig. 2, sample data of the training set is input to an affinity value accuracy and a training set loss value obtained by an affinity prediction model to be trained; inputting sample data of the verification set into an affinity value accuracy rate and a verification set loss value obtained by a trained affinity prediction model; as can be seen from FIG. 2, the loss value of the verification set and the loss value of the training set are very small, the loss of the training set is 0.26, the loss of the verification set is only 0.34, the accuracy of the verification set is very close to that of the training set, the accuracy of the training set is 89%, and the accuracy of the verification set also reaches 87%, so that the deep learning model deep TNA model has no over-fitting or under-fitting condition and has very high model accuracy, and the obtained affinity prediction model deep TNA can accurately predict the affinity score.
Optionally, as shown in fig. 3, the affinity score prediction model includes an encoder and a decoder; inputting the leukocyte antigen sequence and the peptide fragment sequence into a trained affinity score prediction model to obtain affinity scores of the leukocyte antigen sequence and the peptide fragment sequence, wherein the method comprises the following steps of: coding the leukocyte antigen sequence and the peptide fragment sequence by the coder to obtain sequence codes; determining, by the decoder, the affinity score from the sequence encoding. In order to reduce human costs, there is no need to set features based on similar motifs or to set scores for contributions of each motif, reducing the cost of adding experts and the cost of manually setting or marking features. Meanwhile, peptide segments are prevented from being processed into a regular two-dimensional matrix, and complexity of data processing is reduced. And inputting the leukocyte antigen sequence and the peptide fragment sequence into a trained encoder-decoder-based deep learning model, and obtaining the affinity scores of the leukocyte antigen sequence and the peptide fragment sequence. Performing coding operation on the leukocyte antigen sequence and the peptide fragment sequence through a coder in the affinity score prediction model, and converting the leukocyte antigen sequence and the peptide fragment sequence into codes which can be recognized by a computer; the encoded leukocyte antigen sequence and peptide fragment sequence are input to a decoder part in an affinity score prediction model, and affinity score prediction is carried out by the decoder in the affinity score prediction model according to the encoded leukocyte antigen sequence and peptide fragment sequence, so that the affinity scores of the leukocyte antigen sequence and peptide fragment sequence are obtained, and compared with the method that a quantitative feature matrix is constructed by setting features based on similar motifs or contribution of each motif, and then affinity score is calculated according to the set features, the efficiency of affinity score prediction is improved. Further, when training the affinity score prediction model, the length range of the training sample peptide fragment is increased, and optionally, the length range may include: 8-15mer. The predicted peptide range of the trained affinity score prediction model is wider.
Optionally, the sequence encoding of the leukocyte antigen sequence and the peptide stretch by the encoder comprises: splicing the leukocyte antigen sequence and the peptide fragment sequence through the encoder to obtain a spliced sequence; and coding the spliced sequence through the coder to obtain a sequence code. As shown in fig. 3, before the leukocyte antigen sequence and the peptide fragment sequence are encoded by the encoder in the affinity score prediction model, the sequence features of the leukocyte antigen sequence and the peptide fragment sequence are input to the integration layer for splicing, long vectors of the sequences are constructed, the spliced sequences are encoded, and the spliced sequences are input to the encoding layer for encoding the spliced sequences, so that the leukocyte antigen sequence and the peptide fragment sequence do not need to be encoded respectively, and the complexity of the encoding operation is reduced. After the spliced sequences are encoded, the leukocyte antigen sequences and peptide sequences can be converted into codes which can be identified by a computer, so that the subsequent decoder can conveniently perform decoding operation.
Optionally, the decoder is built based on a gating mechanism and an attention mechanism. In the neural network, the learning rate of the front hidden layer is lower than that of the rear hidden layer, that is, the classification accuracy is lowered with the increase of the number of hidden layers. This phenomenon is called the vanishing gradient problem. In neural networks, the attenuation of gradients is caused by successive multiplications, if a very large value occurs in successive multiplications, the gradient calculated in the end will be very large, and it is desirable to obtain a very large gradient value when optimizing to a cliff, if updated with this gradient value, the step size of this iteration will be very large, and a reasonable area may fly out at once, a phenomenon called gradient explosion. The problems of gradient disappearance and gradient explosion are improved through a gating mechanism, and the GRU network is a variant of the LSTM network with good effect and is simpler than the LSTM network in structure and good in effect, so that the GRU network is taken as a preferable network to improve the problems of gradient explosion and gradient disappearance.
The attentiveness mechanism mimics the internal process of biological observation behavior, a mechanism that aligns internal experience with external sensations to increase the observation finesse of a partial region. The attention mechanism can enable the neural network to have the capability of focusing on the subset of the input (or the characteristics), so that important characteristics of sparse data can be extracted rapidly, the input data can be subjected to dimension reduction through the attention mechanism, the prediction result output by the prediction model is focused on the key part screened through the attention mechanism, and the efficiency and the accuracy of processing the data by the neural network model are improved.
Optionally, determining, by the decoder, the affinity score from the sequence encoding comprises: extracting, by the decoder, a target sequence feature from the sequence code; and determining the affinity score according to the target sequence characteristics. The encoded leukocyte antigen sequence and peptide fragment sequence are input to a decoder part in an affinity score prediction model, feature extraction is carried out on the encoded leukocyte antigen sequence and peptide fragment sequence through the decoder in the affinity score prediction model, automation of feature extraction is achieved, and compared with a method of setting features based on similar motifs or setting scores for contributions of each motif, feature extraction is carried out through the decoder, so that the extracted features of the leukocyte antigen sequence and peptide fragment sequence comprise deeper and higher-level meanings, and the efficiency accuracy of affinity score prediction is improved.
Optionally, extracting, by the decoder, the target sequence feature according to the sequence code, including: extracting sequence features from the sequence codes based on the gating mechanism by the decoder; screening the sequence features based on the attention mechanism by the decoder to obtain screened sequence features; and carrying out feature extraction processing on the screening sequence features based on the gating mechanism by the decoder to obtain target sequence features. The codes of the leukocyte antigen sequence and the peptide fragment sequence are subjected to feature extraction through a decoder, after sequence features are extracted, the extracted sequence features are screened based on an attention mechanism, so that features with larger influence on the affinity scores of the leukocyte antigen sequence and the peptide fragment sequence are extracted, and the feature extraction efficiency is improved. And further extracting the features from the screened features, and further excavating and analyzing the features with larger influence to obtain the target sequence features. And obtaining the affinity scores of the leukocyte antigen sequences and the peptide fragment sequences according to the characteristics of the target sequences. Illustratively, the target sequence features are input into a sigmoid function, which is an S-shaped function by which the target sequence features can be mapped between [0,1], resulting in affinity scores for leukocyte antigen sequences and peptide fragment sequences ranging between [0,1 ]. And the affinity scores of the leukocyte antigen sequences and the peptide fragment sequences are obtained through the target training characteristics, so that the efficiency and the accuracy of the affinity scores are improved. The affinity scores of the leukocyte antigen sequences and the peptide sequences are predicted by the deep learning model of the encoder-decoder structure, the advanced features of the leukocyte antigen sequences and the peptide sequences can be automatically extracted, meanwhile, the affinity prediction model can be more easily updated and deployed, the obtained leukocyte antigen sequences and peptide sequences can be input into the affinity prediction model to obtain the affinity scores, the obtained leukocyte antigen sequences and peptide sequences do not need to be subjected to data preprocessing, the complexity of data processing is reduced, manual intervention is not needed, and the efficiency and the accuracy of affinity score obtaining are improved.
S130, determining whether the peptide sequence is a tumor neoantigen according to the affinity score.
Tumor Specific Antigens (TSAs) are antigens recognized by T cells, and genomic variants from tumors are expressed as tumor-specific peptide fragments (neo-peptides) defined as neoantigens. Unlike tumor-associated antigens, which are present only in tumor cells, high quality tumor neoantigens are generally mutant peptides, which have higher affinity for leukocyte antigens than normal peptides, so that it is determined whether a peptide fragment sequence is a tumor neoantigen by the affinity scores of the leukocyte antigen sequence and the peptide fragment sequence.
According to the technical scheme, a leukocyte antigen sequence and a peptide fragment sequence of a tumor patient are obtained; inputting the leukocyte antigen sequence and the peptide fragment sequence into a trained affinity score prediction model to obtain affinity scores of the leukocyte antigen sequence and the peptide fragment sequence; the affinity scores of the leukocyte antigen sequences and the peptide sequences can be automatically determined, a metering feature matrix is constructed based on similar motif setting features or contribution setting scores of each motif, features can be automatically extracted through an affinity score prediction model, deeper features can be extracted, and therefore accuracy of the affinity scores is improved; whether the peptide sequence is the tumor neoantigen or not is determined according to the affinity score, so that the problem that the accuracy and the efficiency of determining the tumor neoantigen are low is solved, the accuracy and the efficiency of determining the tumor neoantigen are improved, and the complexity of software implementation and the labor cost are reduced.
Example two
Fig. 4 is a flowchart of a tumor neoantigen prediction method according to a second embodiment of the present invention, where the method further includes determining whether the peptide sequence is a tumor neoantigen according to an affinity score based on the previous embodiment, including: and when the affinity score reaches a preset threshold value, determining the peptide fragment sequence as tumor neoantigen. The prior art generally predicts tumor neoantigens based on tumor expression data, and the lack of comparison of normal peptide fragments and mutant peptides with leukocyte antigen affinities can cause false positives in the screening of tumor neoantigens. And determining whether the peptide sequence is tumor neoantigen or not through affinity scores, so that the accuracy of tumor neoantigen prediction can be improved.
As shown in fig. 4, the method specifically comprises the following steps:
s210, obtaining leukocyte antigen sequences and peptide fragment sequences of tumor patients.
S220, inputting the leukocyte antigen sequence and the peptide fragment sequence into a trained affinity score prediction model to obtain affinity scores of the leukocyte antigen sequence and the peptide fragment sequence.
And S230, determining the peptide sequence as tumor neoantigen when the affinity score reaches a preset threshold value.
Determining whether the obtained peptide fragment sequence is tumor neoantigen according to the affinity scores of the leukocyte antigen sequence and the peptide fragment sequence obtained by the affinity score prediction model. Tumor neoantigens are generally mutated peptides that have a higher affinity for leukocyte antigens than normal peptides. Optionally, presetting an affinity score threshold, and determining that the current peptide sequence is tumor neoantigen when the affinity score of the current peptide sequence and the leukocyte antigen is higher than the preset affinity score threshold; and when the affinity score of the current peptide sequence and the leukocyte antigen is lower than a preset affinity score threshold, determining that the current peptide sequence is not the tumor neoantigen. Compared with the prior art, the method predicts the tumor neoantigens from the expression data of the tumor, and can cause false positive in the screening of the tumor neoantigens. And whether the peptide sequence is the tumor neoantigen is determined by the affinity score, so that the accuracy of predicting the tumor neoantigen can be improved, and the phenomenon of false positive in screening of the tumor neoantigen is avoided. Taking the existing average relative combination (Average Relative Binding, ARB) model with higher accuracy as an example, as shown in fig. 5, the accuracy of various neoantigen types in the test set predicted by the ARB model, and the accuracy of various tumor neoantigen types in the test set predicted by the tumor neoantigen prediction model provided in this embodiment, namely the deep tna model, are shown. As can be seen from fig. 5, the accuracy of the most of the tumor neoantigen types in the test set predicted by the tumor neoantigen prediction model deep tna provided in this embodiment is higher than that of the existing model ARB, and only two tumor neoantigen types are predicted with a slightly lower accuracy than that of the ARB model, so the method provided in this embodiment has a good effect on predicting tumor neoantigens.
According to the technical scheme, a leukocyte antigen sequence and a peptide fragment sequence of a tumor patient are obtained; inputting the leukocyte antigen sequence and the peptide fragment sequence into a trained affinity score prediction model to obtain affinity scores of the leukocyte antigen sequence and the peptide fragment sequence; the affinity scores of the leukocyte antigen sequences and the peptide sequences can be automatically determined, a metering feature matrix is not required to be constructed based on similar motif setting features or contribution setting scores of each motif, features can be automatically extracted through an affinity score prediction model, deeper features can be extracted, and therefore accuracy of the affinity scores is improved; when the affinity score reaches a preset threshold value, the peptide sequence is determined to be the tumor neoantigen, so that the accuracy of predicting the tumor neoantigen is improved, the false positive phenomenon of screening the tumor neoantigen is reduced, the problem of low accuracy and efficiency of determining the tumor neoantigen is solved, the accuracy and efficiency of determining the tumor neoantigen are improved, and the labor cost is reduced.
Example III
Fig. 6 is a block diagram of a tumor neoantigen predicting apparatus according to a third embodiment of the present invention, where the tumor neoantigen predicting apparatus includes: a sequence acquisition module 310, an affinity score acquisition module 320, and a tumor neoantigen determination module 330.
The sequence acquisition module 310 is used for acquiring a leukocyte antigen sequence and a peptide fragment sequence of a tumor patient; the affinity score obtaining module 320 is configured to input the leukocyte antigen sequence and the peptide fragment sequence into a trained prediction model, so as to obtain affinity scores of the leukocyte antigen sequence and the peptide fragment sequence; a tumor neoantigen determination module 330 for determining whether the peptide fragment sequence is a tumor neoantigen based on the affinity score.
Optionally, the affinity score prediction model includes an encoder and a decoder;
in the foregoing embodiment, the affinity score obtaining module 320 includes:
a sequence code generating unit for encoding the leukocyte antigen sequence and the peptide sequence by the encoder to obtain a sequence code;
and the affinity score determining unit is used for determining the affinity score according to the sequence codes through the decoder.
In the technical solution of the above embodiment, the sequence code generating unit includes:
a spliced sequence generation subunit, configured to splice the leukocyte antigen sequence and the peptide fragment sequence through the encoder to obtain a spliced sequence;
and the sequence code generation subunit is used for coding the spliced sequence through the coder to obtain a sequence code.
Optionally, the decoder is constructed based on a gating mechanism and an attention mechanism.
In the technical solution of the above embodiment, the affinity score determining unit includes:
a target sequence feature extraction subunit, configured to extract, by using the decoder, a target sequence feature according to the sequence code;
and the affinity score determining subunit is used for determining the affinity score according to the target sequence characteristics.
In the technical solution of the foregoing embodiment, the target sequence feature extraction subunit includes:
a sequence feature extraction subunit for extracting sequence features from the sequence codes based on the gating mechanism by the decoder;
a screening sequence feature obtaining subunit, configured to screen the sequence feature by using the decoder based on the attention mechanism, to obtain a screening sequence feature;
and the feature extraction subunit is used for carrying out feature extraction processing on the screening sequence features based on the gating mechanism through the decoder to obtain target sequence features.
In the above embodiment, the tumor neoantigen determining module 330 is specifically configured to determine that the peptide sequence is the tumor neoantigen when the affinity score reaches a preset threshold.
According to the technical scheme, a leukocyte antigen sequence and a peptide fragment sequence of a tumor patient are obtained; inputting the leukocyte antigen sequence and the peptide fragment sequence into a trained affinity score prediction model to obtain affinity scores of the leukocyte antigen sequence and the peptide fragment sequence; the affinity scores of the leukocyte antigen sequences and the peptide sequences can be automatically determined, a metering feature matrix is not required to be constructed based on similar motif setting features or contribution setting scores of each motif, features can be automatically extracted through an affinity score prediction model, deeper features can be extracted, and therefore accuracy of the affinity scores is improved; whether the peptide sequence is a tumor neoantigen or not is determined according to the affinity score, so that the false positive phenomenon of screening of the tumor neoantigen is reduced, the problem of low accuracy and efficiency of determining the tumor neoantigen is solved, and the effects of improving the accuracy and efficiency of determining the tumor neoantigen and reducing the complexity of a program and the labor cost are achieved.
The tumor neoantigen prediction device provided by the embodiment of the invention can execute the tumor neoantigen prediction method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.
Example IV
Fig. 7 is a schematic structural diagram of a tumor neoantigen predicting apparatus according to a fourth embodiment of the present invention, and as shown in fig. 7, the tumor neoantigen predicting apparatus includes a processor 410, a memory 420, an input device 430 and an output device 440; the number of processors 410 in the tumor neoantigen prediction apparatus may be one or more, and one processor 410 is exemplified in fig. 7; the processor 410, memory 420, input means 430 and output means 440 in the tumor neoantigen predicting apparatus may be connected by a bus or other means, for example by a bus connection in fig. 7.
The memory 420 is used as a computer readable storage medium for storing a software program, a computer executable program, and modules, such as program instructions/modules corresponding to the tumor neoantigen prediction method in the embodiment of the present invention (for example, the sequence acquisition module 310, the affinity score acquisition module 320, and the tumor neoantigen determination module 330 in the tumor neoantigen prediction apparatus). The processor 410 executes various functional applications and data processing of the tumor neoantigen prediction apparatus by running software programs, instructions and modules stored in the memory 420, i.e., implements the tumor neoantigen prediction method described above.
Memory 420 may include primarily a program storage area and a data storage area, wherein the program storage area may store an operating system, at least one application program required for functionality; the storage data area may store data created according to the use of the terminal, etc. In addition, memory 420 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device. In some examples, the memory 420 may further include memory remotely located relative to the processor 410, which may be connected to the tumor neoantigen prediction device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input means 430 may be used to receive entered numeric or character information and to generate key signal inputs related to user settings and function control of the tumor-neoantigen predicting device. The output 440 may include a display device such as a display screen.
Example five
A fifth embodiment of the present invention also provides a storage medium containing computer-executable instructions, which when executed by a computer processor, are for performing a tumor neoantigen prediction method, the method comprising:
obtaining leukocyte antigen sequences and peptide fragment sequences of tumor patients;
inputting the leukocyte antigen sequence and the peptide fragment sequence into a trained affinity score prediction model to obtain affinity scores of the leukocyte antigen sequence and the peptide fragment sequence;
determining whether the peptide fragment sequence is a tumor neoantigen based on the affinity score.
Of course, the storage medium containing the computer-executable instructions provided in the embodiments of the present invention is not limited to the method operations described above, and may also perform the relevant operations in the tumor neoantigen prediction method provided in any embodiment of the present invention.
From the above description of embodiments, it will be clear to a person skilled in the art that the present invention may be implemented by means of software and necessary general purpose hardware, but of course also by means of hardware, although in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, etc., and include several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments of the present invention.
It should be noted that, in the embodiment of the tumor neoantigen predicting apparatus, each unit and module included are only divided according to the functional logic, but not limited to the above division, so long as the corresponding functions can be implemented; in addition, the specific names of the functional units are also only for distinguishing from each other, and are not used to limit the protection scope of the present invention.
Note that the above is only a preferred embodiment of the present invention and the technical principle applied. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, while the invention has been described in connection with the above embodiments, the invention is not limited to the embodiments, but may be embodied in many other equivalent forms without departing from the spirit or scope of the invention, which is set forth in the following claims.

Claims (8)

1. A method for predicting a tumor neoantigen, comprising:
obtaining leukocyte antigen sequences and peptide fragment sequences of tumor patients;
inputting the leukocyte antigen sequence and the peptide fragment sequence into a trained affinity score prediction model to obtain affinity scores of the leukocyte antigen sequence and the peptide fragment sequence, wherein the affinity score prediction model comprises an encoder and a decoder;
determining whether the peptide sequence is a tumor neoantigen according to the affinity score;
the step of inputting the leukocyte antigen sequence and the peptide fragment sequence into a trained affinity score prediction model to obtain affinity scores of the leukocyte antigen sequence and the peptide fragment sequence comprises the following steps:
encoding the leukocyte antigen sequence and the peptide fragment sequence by the encoder to obtain a sequence code;
determining, by the decoder, the affinity score from the sequence encoding;
wherein said encoding of said leukocyte antigen sequence and said peptide stretch sequence by said encoder results in sequence encoding, comprising:
splicing the leukocyte antigen sequence and the peptide fragment sequence through the encoder to obtain a spliced sequence;
and coding the spliced sequence through the coder to obtain a sequence code.
2. The method of claim 1, wherein the decoder is constructed based on a gating mechanism and an attention mechanism.
3. The method of claim 2, wherein said determining, by said decoder, said affinity score from said sequence encoding comprises:
extracting, by the decoder, a target sequence feature from the sequence code;
and determining the affinity score according to the target sequence characteristics.
4. A method according to claim 3, wherein said extracting, by said decoder, target sequence features from said sequence codes comprises:
extracting sequence features from the sequence codes based on the gating mechanism by the decoder;
screening the sequence features based on the attention mechanism by the decoder to obtain screened sequence features;
and carrying out feature extraction processing on the screening sequence features based on the gating mechanism by the decoder to obtain target sequence features.
5. The method of claim 1, wherein determining whether the peptide fragment sequence is a tumor neoantigen based on the affinity score comprises:
and when the affinity score reaches a preset threshold value, determining the peptide fragment sequence as tumor neoantigen.
6. A tumor neoantigen predicting apparatus, comprising:
the sequence acquisition module is used for acquiring a leukocyte antigen sequence and a peptide fragment sequence of a tumor patient;
the affinity score obtaining module is used for inputting the leukocyte antigen sequence and the peptide fragment sequence into a trained prediction model to obtain affinity scores of the leukocyte antigen sequence and the peptide fragment sequence, wherein the affinity score prediction model comprises an encoder and a decoder;
a tumor neoantigen determination module for determining whether the peptide sequence is a tumor neoantigen based on the affinity score;
wherein, affinity score acquisition module includes:
a sequence code generating unit for encoding the leukocyte antigen sequence and the peptide fragment sequence by the encoder to obtain a sequence code;
an affinity score determining unit for determining, by the decoder, the affinity score from the sequence code;
wherein the sequence code generation unit includes:
a spliced sequence generation subunit, configured to splice the leukocyte antigen sequence and the peptide fragment sequence through the encoder to obtain a spliced sequence;
and the sequence code generation subunit is used for coding the spliced sequence through the coder to obtain a sequence code.
7. A tumor neoantigen determination apparatus, characterized in that the tumor neoantigen determination apparatus comprises:
one or more processors;
a storage means for storing one or more programs;
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the tumor neoantigen prediction method of any one of claims 1-5.
8. A computer readable storage medium having stored thereon a computer program, characterized in that the program when executed by a processor implements the tumor neoantigen prediction method according to any one of claims 1-5.
CN202110303245.1A 2021-03-22 2021-03-22 Tumor neogenesis antigen prediction method, device, equipment and medium Active CN112908421B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110303245.1A CN112908421B (en) 2021-03-22 2021-03-22 Tumor neogenesis antigen prediction method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110303245.1A CN112908421B (en) 2021-03-22 2021-03-22 Tumor neogenesis antigen prediction method, device, equipment and medium

Publications (2)

Publication Number Publication Date
CN112908421A CN112908421A (en) 2021-06-04
CN112908421B true CN112908421B (en) 2024-02-06

Family

ID=76105914

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110303245.1A Active CN112908421B (en) 2021-03-22 2021-03-22 Tumor neogenesis antigen prediction method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN112908421B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114464247A (en) * 2022-01-30 2022-05-10 腾讯科技(深圳)有限公司 Method and device for predicting binding affinity based on antigen and antibody sequences
CN114561472A (en) * 2022-04-27 2022-05-31 普瑞基准科技(北京)有限公司 Kit for detecting or assisting in detecting tumor-related gene variation and application thereof

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109671469A (en) * 2018-12-11 2019-04-23 浙江大学 The method for predicting marriage relation and binding affinity between polypeptide and HLA I type molecule based on Recognition with Recurrent Neural Network
CN109706065A (en) * 2018-12-29 2019-05-03 深圳裕策生物科技有限公司 Tumor neogenetic antigen load detection device and storage medium
KR102159921B1 (en) * 2020-03-24 2020-09-25 주식회사 테라젠바이오 Method for predicting neoantigen using a peptide sequence and hla allele sequence and computer program
CN111815614A (en) * 2020-07-17 2020-10-23 中国人民解放军军事科学院军事医学研究院 Parasite detection method and system based on artificial intelligence and terminal equipment
KR102184720B1 (en) * 2019-10-11 2020-11-30 한국과학기술원 Prediction method for binding preference between mhc and peptide on cancer cell and analysis apparatus

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10049106B2 (en) * 2017-01-18 2018-08-14 Xerox Corporation Natural language generation through character-based recurrent neural networks with finite-state prior knowledge
US20200243164A1 (en) * 2019-01-30 2020-07-30 Bioinformatics Solutions Inc. Systems and methods for patient-specific identification of neoantigens by de novo peptide sequencing for personalized immunotherapy
US11651860B2 (en) * 2019-05-15 2023-05-16 International Business Machines Corporation Drug efficacy prediction for treatment of genetic disease
AU2020327939A1 (en) * 2019-08-09 2022-03-24 Immatics Biotechnologies Gmbh Methods for peptide mass spectrometry fragmentation prediction

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109671469A (en) * 2018-12-11 2019-04-23 浙江大学 The method for predicting marriage relation and binding affinity between polypeptide and HLA I type molecule based on Recognition with Recurrent Neural Network
CN109706065A (en) * 2018-12-29 2019-05-03 深圳裕策生物科技有限公司 Tumor neogenetic antigen load detection device and storage medium
KR102184720B1 (en) * 2019-10-11 2020-11-30 한국과학기술원 Prediction method for binding preference between mhc and peptide on cancer cell and analysis apparatus
KR102159921B1 (en) * 2020-03-24 2020-09-25 주식회사 테라젠바이오 Method for predicting neoantigen using a peptide sequence and hla allele sequence and computer program
CN111815614A (en) * 2020-07-17 2020-10-23 中国人民解放军军事科学院军事医学研究院 Parasite detection method and system based on artificial intelligence and terminal equipment

Also Published As

Publication number Publication date
CN112908421A (en) 2021-06-04

Similar Documents

Publication Publication Date Title
JP7459159B2 (en) GAN-CNN for MHC peptide binding prediction
CN111312329B (en) Transcription factor binding site prediction method based on deep convolution automatic encoder
CN112767997B (en) Protein secondary structure prediction method based on multi-scale convolution attention neural network
US11694769B2 (en) Systems and methods for de novo peptide sequencing from data-independent acquisition using deep learning
CN112908421B (en) Tumor neogenesis antigen prediction method, device, equipment and medium
WO2023134296A1 (en) Classification and prediction method and apparatus, and device, storage medium and computer program product
CN112235327A (en) Abnormal log detection method, device, equipment and computer readable storage medium
US20200326348A1 (en) Systems and methods for de novo peptide sequencing using deep learning and spectrum pairs
CN114530222B (en) Cancer patient classification system based on multiunit science and image data fusion
CN114446389B (en) Tumor neoantigen feature analysis and immunogenicity prediction tool and application thereof
CN114360644A (en) Method and system for predicting combination of T cell receptor and epitope
CN114494168A (en) Model determination, image recognition and industrial quality inspection method, equipment and storage medium
CN113762417A (en) Method for enhancing HLA antigen presentation prediction system based on deep migration
CN112801940B (en) Model evaluation method, device, equipment and medium
CN110488020A (en) A kind of protein glycation site identification method
JP2023530719A (en) Machine learning techniques for predicting surface-displayed peptides
Soullard et al. Ctcmodel: a keras model for connectionist temporal classification
CN117274212A (en) Bridge underwater structure crack detection method
TWI835007B (en) Computer-implemented method and system for predicting binding and presentation of peptides by mhc molecules, computer-implemented method for performing multiple instance learning and tangible, non-transitory computer-readable medium
Iravani et al. An Interpretable Deep Learning Approach for Biomarker Detection in LC-MS Proteomics Data
Halsana et al. DensePPI: A Novel Image-Based Deep Learning Method for Prediction of Protein–Protein Interactions
CN109767814A (en) A kind of amino acid global characteristics vector representation method based on GloVe model
CN115512762B (en) Polypeptide sequence generation method and device, electronic equipment and storage medium
Shehryar et al. Mutation detection in genes sequence using machine learning
US20240071570A1 (en) Peptide search system for immunotherapy

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant