CN112908421A - Tumor neogenesis antigen prediction method, device, equipment and medium - Google Patents

Tumor neogenesis antigen prediction method, device, equipment and medium Download PDF

Info

Publication number
CN112908421A
CN112908421A CN202110303245.1A CN202110303245A CN112908421A CN 112908421 A CN112908421 A CN 112908421A CN 202110303245 A CN202110303245 A CN 202110303245A CN 112908421 A CN112908421 A CN 112908421A
Authority
CN
China
Prior art keywords
sequence
tumor
peptide fragment
affinity
affinity score
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110303245.1A
Other languages
Chinese (zh)
Other versions
CN112908421B (en
Inventor
彭鑫鑫
米玉涛
季序我
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Precision Scientific Technology Beijing Co ltd
Predatum Biomedicine Suzhou Co ltd
Original Assignee
Precision Scientific Technology Beijing Co ltd
Predatum Biomedicine Suzhou Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Precision Scientific Technology Beijing Co ltd, Predatum Biomedicine Suzhou Co ltd filed Critical Precision Scientific Technology Beijing Co ltd
Priority to CN202110303245.1A priority Critical patent/CN112908421B/en
Publication of CN112908421A publication Critical patent/CN112908421A/en
Application granted granted Critical
Publication of CN112908421B publication Critical patent/CN112908421B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Medical Informatics (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Analytical Chemistry (AREA)
  • Bioethics (AREA)
  • Chemical & Material Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • Public Health (AREA)
  • Investigating Or Analysing Biological Materials (AREA)
  • Peptides Or Proteins (AREA)

Abstract

The embodiment of the invention discloses a method, a device, equipment and a medium for predicting tumor neoantigen. The method comprises the following steps: obtaining a leucocyte antigen sequence and a peptide fragment sequence of a tumor patient; inputting the leucocyte antigen sequence and the peptide segment sequence into a trained affinity score prediction model to obtain affinity scores of the leucocyte antigen sequence and the peptide segment sequence; and determining whether the peptide fragment sequence is a tumor neoantigen according to the affinity score. According to the technical scheme of the embodiment of the invention, the characteristics can be automatically extracted through the affinity score prediction model, and deeper characteristics can be extracted, so that the accuracy of the affinity score is improved; the method solves the problem of low accuracy and efficiency of determining the tumor neoantigen, and achieves the effects of improving the accuracy and efficiency of determining the tumor neoantigen and reducing the labor cost.

Description

Tumor neogenesis antigen prediction method, device, equipment and medium
Technical Field
The embodiment of the invention relates to the technical field of biological information, in particular to a method, a device, equipment and a medium for predicting a tumor neogenesis antigen.
Background
The tumor neoantigen is a 'non-self' neoprotein polypeptide which is recognized by a human antigen presenting cell and originally does not exist in a human body, the 'non-self' neoprotein polypeptide is mainly formed by the apoptosis of mutant protein formed by the mutation of tumor cells, and the tumor neoantigen is a key factor for exciting the initial immune response of an organism immune system to the tumor cells.
At present, the prediction methods for tumor neoantigens are mainly classified into three types: a first class of structure-based methods; the second type predicts the affinity value of the peptide fragment and the neoantigen based on a scoring matrix of a specific position; the third is based on machine learning methods.
The structure-based method is to calculate the minimum free energy of the peptide fragment-HLA complex, but because the number of crystal structures is limited, the prediction speed is very slow and inaccurate; the affinity value of the peptide fragment and the neoantigen is predicted based on the scoring matrix of the specific position, the linear computation complexity of the method is much lower than the nonlinear computation complexity of the structure-based method and the machine learning-based method, but the characteristics need to be set for similar motifs, the scoring function of the specific position needs to be constructed, expert experience needs to be blended, the process is complex and tedious, and the accuracy is low; predicting based on a machine learning method, predicting the AUC average value of a large number of HLA types through models such as a support vector machine, a hidden Markov model, a simple neural network and the like, providing a good prediction tool, but considering the contribution of each residue on each position in the peptide, constructing a quantitative matrix, and inputting a machine learning model, wherein the contribution score of the residue needs to be continuously and repeatedly considered in the process, high professional knowledge and experience are required, high-level features cannot be automatically extracted, the prediction precision is not accurate enough for a few HLA types appearing in most people, and the methods are low in efficiency in predicting a large number of peptides generated from whole genome and transcriptome sequencing data due to the nonlinearity of the high-level features; performance is improved based on the combined approach, but performance is still unsatisfactory.
Disclosure of Invention
The embodiment of the invention provides a tumor neoantigen prediction method, a tumor neoantigen prediction device, tumor neoantigen prediction equipment and a tumor neoantigen prediction medium, so that the effects of improving the accuracy of determining tumor neoantigens and reducing the labor cost are achieved.
In a first aspect, the embodiments of the present invention provide a method for predicting tumor neoantigen, the method including:
obtaining a leucocyte antigen sequence and a peptide fragment sequence of a tumor patient;
inputting the leucocyte antigen sequence and the peptide segment sequence into a trained affinity score prediction model to obtain affinity scores of the leucocyte antigen sequence and the peptide segment sequence;
and determining whether the peptide fragment sequence is a tumor neoantigen according to the affinity score.
In a second aspect, the embodiments of the present invention also provide a device for predicting tumor neoantigen, the device including:
the sequence acquisition module is used for acquiring a leukocyte antigen sequence and a peptide fragment sequence of a tumor patient;
the affinity score acquisition module is used for inputting the leukocyte antigen sequence and the peptide segment sequence into a trained prediction model to obtain the affinity scores of the leukocyte antigen sequence and the peptide segment sequence;
and the tumor neogenesis antigen determination module is used for determining whether the peptide segment sequence is the tumor neogenesis antigen according to the affinity score.
In a third aspect, an embodiment of the present invention further provides a tumor neoantigen prediction apparatus, where the tumor neoantigen prediction apparatus includes:
one or more processors;
storage means for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement a method for tumor neoantigen prediction as provided by any of the embodiments of the invention.
In a fourth aspect, the embodiments of the present invention further provide a computer-readable storage medium, on which a computer program is stored, where the program, when executed by a processor, implements the method for predicting tumor neoantigen provided in any embodiment of the present invention.
According to the technical scheme of the embodiment of the invention, a leukocyte antigen sequence and a peptide fragment sequence of a tumor patient are obtained; inputting the leucocyte antigen sequence and the peptide segment sequence into a trained affinity score prediction model to obtain affinity scores of the leucocyte antigen sequence and the peptide segment sequence; affinity scores of a leukocyte antigen sequence and a peptide fragment sequence can be automatically determined, characteristics can be automatically extracted through an affinity score prediction model without setting characteristics based on similar motifs or setting scores for contribution of each motif, and deeper characteristics can be extracted, so that the accuracy of the affinity scores is improved; and determining whether the peptide fragment sequence is the tumor neoantigen according to the affinity value, so that the problems of low accuracy and efficiency in determining the tumor neoantigen are solved, the accuracy and efficiency in determining the tumor neoantigen are improved, and the labor cost is reduced.
Drawings
FIG. 1 is a flow chart of a method for predicting tumor neoantigen in one embodiment of the present invention;
fig. 2 is a schematic diagram of comparison between prediction accuracy and loss value of a training set and a verification set of a tumor neogenesis antigen deep learning deep tna model according to a first embodiment of the present invention;
FIG. 3 is a schematic diagram of a tumor neoantigen DeepTNA model in accordance with a first embodiment of the present invention;
FIG. 4 is a flowchart of a method for predicting tumor neoantigen in the second embodiment of the present invention;
FIG. 5 is a comparison of the accuracy of the prediction of tumor neoantigen compared to the existing prediction models in example two of the present invention;
FIG. 6 is a schematic diagram of a tumor neoantigen prediction apparatus according to a third embodiment of the present invention;
fig. 7 is a schematic structural diagram of a tumor neoantigen prediction apparatus according to a fourth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Example one
Fig. 1 is a flowchart of a method for predicting tumor neoantigen according to an embodiment of the present invention, which can be implemented by a device for predicting tumor neoantigen, and includes the following steps:
s110, obtaining a leukocyte antigen sequence and a peptide fragment sequence of a tumor patient.
At present, the tumor neoantigen can be predicted by obtaining the affinity of the leucocyte antigen sequence and the peptide fragment sequence of the tumor patient. Human leukocyte antigens are highly polymorphic alloantigens, whose chemical nature is a class of glycoproteins, formed by the non-covalent association of an alpha heavy chain (glycosylated) and a beta light chain. The amino-terminal part of the peptide chain faces outwards (about 3/4% of the whole molecule), the carboxy-terminal part penetrates into the cytoplasm, and the central hydrophobic part is in the membrane. The peptide segment is a chain substance obtained by dehydration and condensation of amino acid. Optionally, the leukocyte antigen sequence and the peptide fragment sequence of the tumor patient are obtained through an Immune Epitope Database (IEDB) or related documents, so as to obtain affinity scores of the leukocyte antigen sequence and the peptide fragment sequence based on the leukocyte antigen sequence and the peptide fragment sequence, and whether the peptide fragment sequence is a tumor neoantigen is determined according to the affinity scores.
S120, inputting the leukocyte antigen sequence and the peptide segment sequence into a trained affinity score prediction model to obtain the affinity scores of the leukocyte antigen sequence and the peptide segment sequence.
Inputting the obtained leukocyte antigen sequence and the peptide segment sequence into a trained affinity score prediction model, and directly obtaining the affinity scores of the leukocyte antigen sequence and the peptide segment sequence. The method does not need to set characteristics based on similar motifs or set scores for the contribution of each motif, realizes the prediction of the affinity score of the leukocyte antigen sequence and the peptide fragment sequence by a pre-designed score calculation function based on the characteristics or inputting a scoring characteristic matrix into a traditional machine learning model, such as a random forest model, a decision tree model and the like, can reduce the labor cost, greatly improves the accuracy compared with the prediction accuracy obtained by directly splicing the leukocyte antigen sequence and the peptide fragment sequence and inputting the leukocyte antigen sequence and the peptide fragment sequence into a simple neural network or an RNN model, does not need to process the obtained leukocyte antigen sequence and the peptide fragment sequence into a regular two-dimensional matrix, and inputs the leukocyte antigen sequence and the peptide fragment sequence which are converted into the two-dimensional matrix into a trained deep learning network, such as a convolutional neural network and the like, and affinity values of the leukocyte antigen sequence and the peptide fragment sequence are obtained, and the complexity of data processing is reduced. As shown in fig. 2, inputting sample data of a training set to an affinity prediction model to be trained to obtain an accuracy of an affinity score and a loss value of the training set; inputting sample data of the verification set into the trained affinity prediction model to obtain the accuracy of the affinity value and the loss value of the verification set; as can be seen from fig. 2, the verification set loss value and the training set loss value are both very small, the training set loss is 0.26, the verification set loss is only 0.34, the verification set accuracy rate is very close to the training set accuracy rate, the training set accuracy rate is 89%, and the verification set accuracy rate also reaches 87%, so that the deep learning model deep tna model has no over-fitting or under-fitting condition and is very high in model accuracy rate, and the obtained affinity prediction model deep tna can accurately predict the affinity value.
Alternatively, as shown in FIG. 3, the affinity score prediction model includes an encoder and a decoder; inputting the leukocyte antigen sequence and the peptide fragment sequence into a trained affinity score prediction model to obtain affinity scores of the leukocyte antigen sequence and the peptide fragment sequence, wherein the affinity scores comprise: coding the leucocyte antigen sequence and the peptide fragment sequence through the coder to obtain sequence codes; determining, by the decoder, the affinity score from the sequence encoding. To reduce labor costs, it is not necessary to set features based on similar motifs or to set scores for the contribution of each motif, reducing the cost of adding experts and the cost of manually setting or labeling features. Meanwhile, the peptide fragments are prevented from being processed into a regular two-dimensional matrix, and the complexity of data processing is reduced. And inputting the leukocyte antigen sequence and the peptide fragment sequence into a trained deep learning model based on an encoder-decoder to obtain affinity scores of the leukocyte antigen sequence and the peptide fragment sequence. Coding the leucocyte antigen sequence and the peptide segment sequence by an encoder in an affinity score prediction model, and converting the leucocyte antigen sequence and the peptide segment sequence into codes which can be identified by a computer; inputting the coded leucocyte antigen sequence and peptide segment sequence into a decoder part in an affinity score prediction model, performing affinity score prediction according to the coded leucocyte antigen sequence and peptide segment sequence through a decoder in the affinity score prediction model to obtain the affinity score of the leucocyte antigen sequence and the peptide segment sequence, and compared with the method of setting characteristics based on similar motifs or setting scores for contribution of each motif to construct a quantitative characteristic matrix, calculating the affinity score according to the set characteristics, and improving the efficiency of the affinity score prediction. Further, when training the affinity score prediction model, the length range of the peptide fragment of the training sample is increased, and optionally, the length range may include: 8-15 mer. The prediction peptide section range of the trained affinity score prediction model is wider.
Optionally, the encoding the leukocyte antigen sequence and the peptide sequence by the encoder to obtain a sequence code, including: splicing the leukocyte antigen sequence and the peptide fragment sequence by the encoder to obtain a spliced sequence; and coding the spliced sequence through the coder to obtain a sequence code. As shown in fig. 3, before the leukocyte antigen sequence and the peptide fragment sequence are encoded by the encoder part in the affinity score prediction model, the sequence features of the extracted leukocyte antigen sequence and peptide fragment sequence are input to the integration layer, and are subjected to splicing processing to construct a long vector of the sequence, and the spliced sequence is encoded. After the spliced sequence is coded, the leukocyte antigen sequence and the peptide fragment sequence can be converted into codes which can be identified by a computer, so that a subsequent decoder can conveniently perform decoding operation.
Optionally, the decoder is constructed based on a gating mechanism and an attention mechanism. In the neural network, the learning rate of the current hidden layer is lower than that of the subsequent hidden layer, that is, the classification accuracy is reduced as the number of hidden layers is increased. This phenomenon is called the vanishing gradient problem. In a neural network, gradient attenuation is caused by continuous multiplication, if a very large value appears in the continuous multiplication, the finally calculated gradient is very large, a very large gradient value is obtained when optimizing to a cliff, if updating is carried out by the gradient value, the step size of the iteration is very large, and a reasonable area can be flown out at a moment, and the phenomenon is called gradient explosion. The problem of gradient explosion and gradient disappearance is improved by a gating mechanism, for example, a network with a multiplicative gate structure, such as GRU and LSTM, can be used, and the GRU network is a better variant of the LSTM network, is simpler in structure and better in effect than the LSTM network, so that the GRU network is used as a preferred network to improve the problem of gradient explosion and gradient explosion.
The attention mechanism mimics the internal process of biological observation behavior, i.e., a mechanism that aligns internal experience with external perception to increase the fineness of observation of a partial region. The attention mechanism can enable the neural network to have the ability of focusing on the input (or characteristic) subset, so that important characteristics of sparse data can be extracted quickly, dimension reduction can be performed on the input data through the attention mechanism, the prediction result output by the prediction model can focus more on the key part screened through the attention mechanism, and the efficiency and accuracy of the neural network model for processing data are improved.
Optionally, determining, by the decoder, the affinity score according to the sequence encoding comprises: extracting, by the decoder, a target sequence feature from the sequence code; determining the affinity score according to the target sequence characteristics. The coded leucocyte antigen sequence and peptide fragment sequence are input into a decoder part in an affinity score prediction model, the coded leucocyte antigen sequence and peptide fragment sequence are subjected to feature extraction through the decoder in the affinity score prediction model, automation of feature extraction is achieved, and compared with a method for setting features based on similar motifs or setting a fraction construction feature matrix for contribution of each motif, feature extraction is performed through the decoder, so that the features of the extracted leucocyte antigen sequence and peptide fragment sequence contain deeper and higher meanings, and accuracy of efficiency of affinity score prediction is improved.
Optionally, extracting, by the decoder, the target sequence feature according to the sequence coding includes: extracting, by the decoder, sequence features based on the gating mechanism for the sequence encoding; screening the sequence features by the decoder based on the attention mechanism to obtain screened sequence features; and performing feature extraction processing on the screened sequence features through the decoder based on the gating mechanism to obtain target sequence features. The method comprises the steps of extracting the characteristics of the codes of the leukocyte antigen sequence and the peptide segment sequence through a decoder, screening the extracted sequence characteristics based on an attention mechanism after extracting the sequence characteristics, thereby extracting the characteristics which have great influence on the affinity scores of the leukocyte antigen sequence and the peptide segment sequence and improving the efficiency of characteristic extraction. And further extracting and processing the screened features, and further mining and analyzing the features with larger influence to obtain the target sequence features. And obtaining the affinity scores of the leukocyte antigen sequence and the peptide segment sequence according to the characteristics of the target sequence. Illustratively, the target sequence features are input into a sigmoid function, and the sigmoid function can map the target sequence features to [0,1], so as to obtain affinity scores of the leukocyte antigen sequence and the peptide fragment sequence in the range of [0,1 ]. Affinity scores of the leukocyte antigen sequence and the peptide segment sequence are obtained through the target training characteristics, and the efficiency and accuracy of the affinity scores are improved. The affinity values of the leukocyte antigen sequence and the peptide fragment sequence are predicted through the deep learning model of the encoder-decoder structure, high-grade characteristics of the leukocyte antigen sequence and the peptide fragment sequence can be automatically extracted, meanwhile, the affinity prediction model can be updated and deployed more easily, the acquired leukocyte antigen sequence and the acquired peptide fragment sequence can be input into the affinity prediction model to obtain the affinity values, data preprocessing on the acquired leukocyte antigen sequence and the acquired peptide fragment sequence is not needed, the complexity of data processing is reduced, manual intervention is not needed, and the efficiency and the accuracy of acquiring the affinity values are improved.
S130, determining whether the peptide fragment sequence is a tumor neoantigen according to the affinity score.
Tumor-specific antigens (TSAs) are antigens recognized by T cells, and genomic variants from tumors are expressed as tumor-specific peptides (neo-epitopes) and defined as neoantigens (neoantigens). Unlike tumor associated antigens, which are present only in tumor cells, high quality tumor neoantigens are generally mutant peptides having a higher affinity for leukocyte antigens than normal peptides, and thus whether a peptide fragment is a tumor neoantigen is determined by the affinity scores of the leukocyte antigen sequence and the peptide fragment sequence.
According to the technical scheme of the embodiment of the invention, a leukocyte antigen sequence and a peptide fragment sequence of a tumor patient are obtained; inputting the leucocyte antigen sequence and the peptide segment sequence into a trained affinity score prediction model to obtain affinity scores of the leucocyte antigen sequence and the peptide segment sequence; affinity scores of a leukocyte antigen sequence and a peptide fragment sequence can be automatically determined, a metering feature matrix is constructed based on similar motif setting features or score setting contributions to each motif, features can be automatically extracted through an affinity score prediction model, deeper features can be extracted, and therefore the accuracy of the affinity scores is improved; and determining whether the peptide fragment sequence is the tumor neoantigen according to the affinity value, so that the problems of low accuracy and efficiency in determining the tumor neoantigen are solved, the accuracy and efficiency in determining the tumor neoantigen are improved, and the software implementation complexity and labor cost are reduced.
Example two
Fig. 4 is a flowchart of a method for predicting a tumor neoantigen according to the second embodiment of the present invention, which is a further refinement of the first embodiment, and determines whether the peptide fragment sequence is a tumor neoantigen according to an affinity score, including: and when the affinity score reaches a preset threshold value, determining the peptide fragment sequence as a tumor neoantigen. In the prior art, tumor neoantigen is generally predicted from expression data of tumors, and false positive can be caused in screening of the tumor neoantigen due to lack of comparison of normal peptide fragments and affinity of mutant peptides and leukocyte antigens. Whether the peptide fragment sequence is the tumor neoantigen or not is determined through the affinity score, so that the accuracy of prediction of the tumor neoantigen can be improved.
As shown in fig. 4, the method specifically includes the following steps:
s210, obtaining a leukocyte antigen sequence and a peptide fragment sequence of the tumor patient.
S220, inputting the leukocyte antigen sequence and the peptide segment sequence into a trained affinity score prediction model to obtain the affinity scores of the leukocyte antigen sequence and the peptide segment sequence.
S230, when the affinity score reaches a preset threshold value, determining the peptide fragment sequence as a tumor neoantigen.
And determining whether the obtained peptide fragment sequence is the tumor neoantigen or not according to the affinity scores of the leukocyte antigen sequence and the peptide fragment sequence obtained by the affinity score prediction model. The tumor neoantigen is generally a mutant peptide having a higher affinity for leukocyte antigens than the normal peptide. Optionally, an affinity score threshold is preset, and when the affinity score of the current peptide fragment sequence and the leukocyte antigen is higher than the preset affinity score threshold, the current peptide fragment sequence is determined to be the tumor neoantigen; and when the affinity score of the current peptide fragment sequence and the leukocyte antigen is lower than a preset affinity score threshold value, determining that the current peptide fragment sequence is not the tumor neoantigen. Compared with the prior art which predicts the tumor neoantigen from the expression data of the tumor, the screening of the tumor neoantigen has false positive. Whether the peptide fragment sequence is the tumor neoantigen or not is determined through the affinity value, the accuracy of prediction of the tumor neoantigen can be improved, and the phenomenon that false positive exists in screening of the tumor neoantigen is avoided. Taking the existing Average Relative Binding (ARB) model with high accuracy as an example, as shown in fig. 5, the accuracy of each neogenetic antigen type in the test set predicted by the ARB model and the accuracy of each neogenetic antigen type in the test set predicted by the tumor neogenetic antigen prediction model provided in this embodiment, i.e., the deepna model, are shown. As can be seen from fig. 5, the accuracy of most tumor neoantigen types in the test set of the tumor neoantigen prediction model DeepTNA prediction provided by this embodiment is higher than that of the existing model ARB, and the accuracy of only two tumor neoantigen types prediction is slightly lower than that of the ARB model, so the method provided by this embodiment has a good effect on predicting tumor neoantigens.
According to the technical scheme of the embodiment of the invention, a leukocyte antigen sequence and a peptide fragment sequence of a tumor patient are obtained; inputting the leucocyte antigen sequence and the peptide segment sequence into a trained affinity score prediction model to obtain affinity scores of the leucocyte antigen sequence and the peptide segment sequence; affinity scores of a leukocyte antigen sequence and a peptide fragment sequence can be automatically determined, a metering feature matrix is not required to be constructed based on similar motif setting features or score setting contributions to each motif, features can be automatically extracted through an affinity score prediction model, deeper features can be extracted, and therefore the accuracy of the affinity scores is improved; when the affinity score reaches a preset threshold value, the peptide segment sequence is determined to be the tumor neoantigen, so that the accuracy of tumor neoantigen prediction is improved, the phenomenon of false positive in screening of the tumor neoantigen is reduced, the problems of low accuracy and efficiency in determining the tumor neoantigen are solved, the accuracy and efficiency in determining the tumor neoantigen are improved, and the effect of reducing the labor cost is achieved.
EXAMPLE III
Fig. 6 is a structural diagram of a tumor neoantigen prediction apparatus according to a third embodiment of the present invention, the tumor neoantigen prediction apparatus including: a sequence acquisition module 310, an affinity score acquisition module 320, and a tumor neoantigen determination module 330.
The sequence acquisition module 310 is used for acquiring a leukocyte antigen sequence and a peptide fragment sequence of a tumor patient; an affinity score obtaining module 320, configured to input the leukocyte antigen sequence and the peptide fragment sequence into a trained prediction model, so as to obtain affinity scores of the leukocyte antigen sequence and the peptide fragment sequence; and the tumor neogenesis antigen determining module 330 is configured to determine whether the peptide fragment sequence is a tumor neogenesis antigen according to the affinity score.
Optionally, the affinity score prediction model comprises an encoder and a decoder;
in the technical solution of the above embodiment, the affinity score obtaining module 320 includes:
a sequence code generating unit, which is used for coding the leucocyte antigen sequence and the peptide segment sequence through the coder to obtain a sequence code;
an affinity score determination unit for determining, by the decoder, the affinity score from the sequence code.
In the technical solution of the above embodiment, the sequence coding generating unit includes:
the splicing sequence generating subunit is used for splicing the leucocyte antigen sequence and the peptide fragment sequence through the encoder to obtain a splicing sequence;
and the sequence coding generation subunit is used for coding the spliced sequence through the coder to obtain a sequence code.
Optionally, the decoder is constructed based on a gating mechanism and an attention mechanism.
In the technical solution of the above embodiment, the affinity score determining unit includes:
a target sequence feature extraction subunit, configured to extract, by the decoder, a target sequence feature according to the sequence coding;
and the affinity score determining subunit is used for determining the affinity score according to the target sequence characteristics.
In the technical solution of the above embodiment, the target sequence feature extraction subunit includes:
a sequence feature extraction subunit for extracting, by the decoder, a sequence feature based on the gating mechanism for the sequence encoding;
a screening sequence feature obtaining subunit, configured to screen, by the decoder, the sequence feature based on the attention mechanism to obtain a screening sequence feature;
and the characteristic extraction subunit is used for performing characteristic extraction processing on the screened sequence characteristics through the decoder based on the gating mechanism to obtain target sequence characteristics.
In the technical solution of the above embodiment, the module 330 for determining a tumor neogenesis antigen is specifically configured to determine that the peptide sequence is a tumor neogenesis antigen when the affinity score reaches a preset threshold.
According to the technical scheme of the embodiment of the invention, a leukocyte antigen sequence and a peptide fragment sequence of a tumor patient are obtained; inputting the leucocyte antigen sequence and the peptide segment sequence into a trained affinity score prediction model to obtain affinity scores of the leucocyte antigen sequence and the peptide segment sequence; affinity scores of a leukocyte antigen sequence and a peptide fragment sequence can be automatically determined, a metering feature matrix is not required to be constructed based on similar motif setting features or score setting contributions to each motif, features can be automatically extracted through an affinity score prediction model, deeper features can be extracted, and therefore the accuracy of the affinity scores is improved; whether the peptide fragment sequence is the tumor neoantigen or not is determined according to the affinity value, the phenomenon of false positive in the screening of the tumor neoantigen is reduced, the problems of low accuracy and efficiency in determining the tumor neoantigen are solved, the accuracy and efficiency in determining the tumor neoantigen are improved, and the complexity of a program and the labor cost are reduced.
The tumor neogenesis antigen prediction device provided by the embodiment of the invention can execute the tumor neogenesis antigen prediction method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.
Example four
Fig. 7 is a schematic structural diagram of a tumor neoantigen prediction apparatus according to a fourth embodiment of the present invention, as shown in fig. 7, the tumor neoantigen prediction apparatus includes a processor 410, a memory 420, an input device 430, and an output device 440; the number of the processors 410 in the tumor neogenesis antigen prediction device may be one or more, and one processor 410 is taken as an example in fig. 7; the processor 410, the memory 420, the input device 430 and the output device 440 of the tumor neogenesis antigen prediction apparatus may be connected by a bus or other means, and fig. 7 illustrates the connection by the bus as an example.
The memory 420 serves as a computer-readable storage medium for storing software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the tumor neoantigen prediction method in the embodiment of the present invention (for example, the sequence acquisition module 310, the affinity score acquisition module 320, and the tumor neoantigen determination module 330 in the tumor neoantigen prediction apparatus). The processor 410 executes various functional applications and data processing of the tumor neoantigen prediction apparatus by executing software programs, instructions and modules stored in the memory 420, so as to implement the above-mentioned tumor neoantigen prediction method.
The memory 420 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal, and the like. Further, the memory 420 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, memory 420 may further include memory located remotely from processor 410, which may be connected to the tumor neogenesis antigen prediction device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input means 430 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the tumor neogenesis antigen prediction apparatus. The output device 440 may include a display device such as a display screen.
EXAMPLE five
Embodiments of the present invention also provide a storage medium containing computer-executable instructions which, when executed by a computer processor, perform a method for tumor neoantigen prediction, the method comprising:
obtaining a leucocyte antigen sequence and a peptide fragment sequence of a tumor patient;
inputting the leucocyte antigen sequence and the peptide segment sequence into a trained affinity score prediction model to obtain affinity scores of the leucocyte antigen sequence and the peptide segment sequence;
and determining whether the peptide fragment sequence is a tumor neoantigen according to the affinity score.
Of course, the embodiments of the present invention provide a storage medium containing computer-executable instructions, which are not limited to the operations of the method described above, and can also perform related operations in the method for predicting tumor neogenesis antigen provided by any embodiments of the present invention.
From the above description of the embodiments, it is obvious for those skilled in the art that the present invention can be implemented by software and necessary general hardware, and certainly, can also be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the methods according to the embodiments of the present invention.
It should be noted that, in the embodiment of the tumor neoantigen prediction apparatus, the included units and modules are merely divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be realized; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (10)

1. A method for predicting a neoantigen of a tumor, comprising:
obtaining a leucocyte antigen sequence and a peptide fragment sequence of a tumor patient;
inputting the leucocyte antigen sequence and the peptide segment sequence into a trained affinity score prediction model to obtain affinity scores of the leucocyte antigen sequence and the peptide segment sequence;
and determining whether the peptide fragment sequence is a tumor neoantigen according to the affinity score.
2. The method of claim 1, wherein the affinity score prediction model comprises an encoder and a decoder;
inputting the leukocyte antigen sequence and the peptide fragment sequence into a trained affinity score prediction model to obtain affinity scores of the leukocyte antigen sequence and the peptide fragment sequence, wherein the affinity scores comprise:
coding the leucocyte antigen sequence and the peptide fragment sequence through the coder to obtain sequence codes;
determining, by the decoder, the affinity score from the sequence encoding.
3. The method of claim 2, wherein said encoding said leukocyte antigen sequence and said peptide fragment sequence by said encoder results in a sequence encoding comprising:
splicing the leukocyte antigen sequence and the peptide fragment sequence by the encoder to obtain a spliced sequence;
and coding the spliced sequence through the coder to obtain a sequence code.
4. The method of claim 2, wherein the decoder is constructed based on a gating mechanism and an attention mechanism.
5. The method of claim 4, wherein said determining, by said decoder, said affinity score from said sequence encoding comprises:
extracting, by the decoder, a target sequence feature from the sequence code;
determining the affinity score according to the target sequence characteristics.
6. The method of claim 5, wherein said extracting, by the decoder, the target sequence feature from the sequence code comprises:
extracting, by the decoder, sequence features based on the gating mechanism for the sequence encoding;
screening the sequence features by the decoder based on the attention mechanism to obtain screened sequence features;
and performing feature extraction processing on the screened sequence features through the decoder based on the gating mechanism to obtain target sequence features.
7. The method of claim 1, wherein determining whether the peptide fragment sequence is a tumor neoantigen based on the affinity score comprises:
and when the affinity score reaches a preset threshold value, determining the peptide fragment sequence as a tumor neoantigen.
8. A tumor neoantigen prediction device, comprising:
the sequence acquisition module is used for acquiring a leukocyte antigen sequence and a peptide fragment sequence of a tumor patient;
the affinity score acquisition module is used for inputting the leukocyte antigen sequence and the peptide segment sequence into a trained prediction model to obtain the affinity scores of the leukocyte antigen sequence and the peptide segment sequence;
and the tumor neogenesis antigen determination module is used for determining whether the peptide segment sequence is the tumor neogenesis antigen according to the affinity score.
9. A tumor neoantigen determination device, characterized by comprising:
one or more processors;
storage means for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement the method of tumor neoantigen prediction according to any one of claims 1-7.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out a method for predicting a neogenetic tumor antigen according to any one of claims 1 to 7.
CN202110303245.1A 2021-03-22 2021-03-22 Tumor neogenesis antigen prediction method, device, equipment and medium Active CN112908421B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110303245.1A CN112908421B (en) 2021-03-22 2021-03-22 Tumor neogenesis antigen prediction method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110303245.1A CN112908421B (en) 2021-03-22 2021-03-22 Tumor neogenesis antigen prediction method, device, equipment and medium

Publications (2)

Publication Number Publication Date
CN112908421A true CN112908421A (en) 2021-06-04
CN112908421B CN112908421B (en) 2024-02-06

Family

ID=76105914

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110303245.1A Active CN112908421B (en) 2021-03-22 2021-03-22 Tumor neogenesis antigen prediction method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN112908421B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114561472A (en) * 2022-04-27 2022-05-31 普瑞基准科技(北京)有限公司 Kit for detecting or assisting in detecting tumor-related gene variation and application thereof

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180203852A1 (en) * 2017-01-18 2018-07-19 Xerox Corporation Natural language generation through character-based recurrent neural networks with finite-state prior knowledge
CN109671469A (en) * 2018-12-11 2019-04-23 浙江大学 The method for predicting marriage relation and binding affinity between polypeptide and HLA I type molecule based on Recognition with Recurrent Neural Network
CN109706065A (en) * 2018-12-29 2019-05-03 深圳裕策生物科技有限公司 Tumor neogenetic antigen load detection device and storage medium
US20200243164A1 (en) * 2019-01-30 2020-07-30 Bioinformatics Solutions Inc. Systems and methods for patient-specific identification of neoantigens by de novo peptide sequencing for personalized immunotherapy
KR102159921B1 (en) * 2020-03-24 2020-09-25 주식회사 테라젠바이오 Method for predicting neoantigen using a peptide sequence and hla allele sequence and computer program
CN111815614A (en) * 2020-07-17 2020-10-23 中国人民解放军军事科学院军事医学研究院 Parasite detection method and system based on artificial intelligence and terminal equipment
US20200365270A1 (en) * 2019-05-15 2020-11-19 International Business Machines Corporation Drug efficacy prediction for treatment of genetic disease
KR102184720B1 (en) * 2019-10-11 2020-11-30 한국과학기술원 Prediction method for binding preference between mhc and peptide on cancer cell and analysis apparatus
US20210041454A1 (en) * 2019-08-09 2021-02-11 Immatics US, Inc. Methods for peptide mass spectrometry fragmentation prediction

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180203852A1 (en) * 2017-01-18 2018-07-19 Xerox Corporation Natural language generation through character-based recurrent neural networks with finite-state prior knowledge
CN109671469A (en) * 2018-12-11 2019-04-23 浙江大学 The method for predicting marriage relation and binding affinity between polypeptide and HLA I type molecule based on Recognition with Recurrent Neural Network
CN109706065A (en) * 2018-12-29 2019-05-03 深圳裕策生物科技有限公司 Tumor neogenetic antigen load detection device and storage medium
US20200243164A1 (en) * 2019-01-30 2020-07-30 Bioinformatics Solutions Inc. Systems and methods for patient-specific identification of neoantigens by de novo peptide sequencing for personalized immunotherapy
US20200365270A1 (en) * 2019-05-15 2020-11-19 International Business Machines Corporation Drug efficacy prediction for treatment of genetic disease
US20210041454A1 (en) * 2019-08-09 2021-02-11 Immatics US, Inc. Methods for peptide mass spectrometry fragmentation prediction
KR102184720B1 (en) * 2019-10-11 2020-11-30 한국과학기술원 Prediction method for binding preference between mhc and peptide on cancer cell and analysis apparatus
KR102159921B1 (en) * 2020-03-24 2020-09-25 주식회사 테라젠바이오 Method for predicting neoantigen using a peptide sequence and hla allele sequence and computer program
CN111815614A (en) * 2020-07-17 2020-10-23 中国人民解放军军事科学院军事医学研究院 Parasite detection method and system based on artificial intelligence and terminal equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114561472A (en) * 2022-04-27 2022-05-31 普瑞基准科技(北京)有限公司 Kit for detecting or assisting in detecting tumor-related gene variation and application thereof

Also Published As

Publication number Publication date
CN112908421B (en) 2024-02-06

Similar Documents

Publication Publication Date Title
JP7459159B2 (en) GAN-CNN for MHC peptide binding prediction
CN111798921B (en) RNA binding protein prediction method and device based on multi-scale attention convolution neural network
CN109671469B (en) Method for predicting binding relationship and binding affinity between polypeptide and HLA type I molecule based on circulating neural network
CN111312329B (en) Transcription factor binding site prediction method based on deep convolution automatic encoder
WO2020014767A1 (en) Systems and methods for de novo peptide sequencing from data-independent acquisition using deep learning
JP2019535057A5 (en)
CN112767997A (en) Protein secondary structure prediction method based on multi-scale convolution attention neural network
CN111105843A (en) HLA type I molecule and polypeptide affinity prediction method
CN112235327A (en) Abnormal log detection method, device, equipment and computer readable storage medium
US11644470B2 (en) Systems and methods for de novo peptide sequencing using deep learning and spectrum pairs
Yao et al. SVMTriP: a method to predict B-cell linear antigenic epitopes
CN111950622B (en) Behavior prediction method, device, terminal and storage medium based on artificial intelligence
CN114093415B (en) Peptide fragment detectability prediction method and system
CN114446389B (en) Tumor neoantigen feature analysis and immunogenicity prediction tool and application thereof
CN113762417A (en) Method for enhancing HLA antigen presentation prediction system based on deep migration
Fung et al. Automation of QIIME2 metagenomic analysis platform
CN115168541A (en) Chapter event extraction method and system based on frame semantic mapping and type perception
CN114494168A (en) Model determination, image recognition and industrial quality inspection method, equipment and storage medium
Downey et al. alineR: An R package for optimizing feature-weighted alignments and linguistic distances
CN112908421A (en) Tumor neogenesis antigen prediction method, device, equipment and medium
CN113838524B (en) S-nitrosylation site prediction method, model training method and storage medium
JP2023530719A (en) Machine learning techniques for predicting surface-displayed peptides
CN113611360A (en) Protein-protein interaction site prediction method based on deep learning and XGboost
Dotan et al. Effect of tokenization on transformers for biological sequences
Grinev et al. ORFhunteR: An accurate approach to the automatic identification and annotation of open reading frames in human mRNA molecules

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant