CN110752041A - Method, device and storage medium for predicting neoantigen based on next generation sequencing - Google Patents

Method, device and storage medium for predicting neoantigen based on next generation sequencing Download PDF

Info

Publication number
CN110752041A
CN110752041A CN201911011327.8A CN201911011327A CN110752041A CN 110752041 A CN110752041 A CN 110752041A CN 201911011327 A CN201911011327 A CN 201911011327A CN 110752041 A CN110752041 A CN 110752041A
Authority
CN
China
Prior art keywords
mutation
tumor
hla
neoantigen
peptide
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911011327.8A
Other languages
Chinese (zh)
Other versions
CN110752041B (en
Inventor
但旭
李淼
陈超
王佳茜
高志博
聂新华
朱嘉麒
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Yulce Biological Technology Co Ltd
Original Assignee
Shenzhen Yulce Biological Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Yulce Biological Technology Co Ltd filed Critical Shenzhen Yulce Biological Technology Co Ltd
Priority to CN201911011327.8A priority Critical patent/CN110752041B/en
Publication of CN110752041A publication Critical patent/CN110752041A/en
Application granted granted Critical
Publication of CN110752041B publication Critical patent/CN110752041B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H70/00ICT specially adapted for the handling or processing of medical references
    • G16H70/40ICT specially adapted for the handling or processing of medical references relating to drugs, e.g. their side effects or intended usage
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Medical Informatics (AREA)
  • Genetics & Genomics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Primary Health Care (AREA)
  • Analytical Chemistry (AREA)
  • Epidemiology (AREA)
  • Toxicology (AREA)
  • Pharmacology & Pharmacy (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • Public Health (AREA)
  • Medicinal Chemistry (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Other Investigation Or Analysis Of Materials By Electrical Means (AREA)

Abstract

A method, apparatus and storage medium for neoantigen prediction based on next generation sequencing, the method comprising: obtaining genome sequencing data of a tumor sample and a normal sample and tumor transcriptome sequencing data; carrying out mutation detection on the genome sequencing data to obtain point mutation and insertion deletion mutation, and carrying out fusion gene mutation detection on the tumor transcriptome sequencing data to obtain fusion gene mutation; detecting the HLA molecule type to obtain an HLA typing result of the tumor sample matched with the normal sample; annotation of point mutations, indel mutations, and fusion gene mutations from gene mutations to amino acid mutations; predicting the peptide segments of the point mutation, the insertion deletion mutation and the fusion gene mutation to obtain corresponding mutation prediction peptide segments; and inputting the mutation prediction peptide fragment and the HLA typing result into a neoantigen prediction model, and scoring and sequencing through the neoantigen prediction model to obtain a neoantigen prediction result. The invention can accurately obtain the high-quality newborn antigen from sequencing data.

Description

Method, device and storage medium for predicting neoantigen based on next generation sequencing
Technical Field
The invention relates to the technical field of bioinformatics, in particular to a method and a device for predicting a neoantigen based on next generation sequencing and a storage medium.
Background
Tumor-specific antigens (TSAs) are antigens which are characteristic of tumor cells and are also known as neoantigens. Tumor-specific antigens were proposed in the first half of the last century, and then with the development of molecular biology and the deep knowledge of the function of major histocompatibility complex (abbreviated as MHC) molecules, Boon et al first discovered that in tumors, complexes of specific peptides and MHC molecules produced by tumors can be recognized by T cells such as CD8+ or CD4 +. Subsequent studies have recognized that these antigens recognized by T cells are derived from genomic variations of tumors expressed as tumor-specific peptides (neo-epitopes) and are defined as neoantigens (neoantigens). Unlike tumor-associated antigens, tumor-specific antigens are present only in tumor cells.
Immune checkpoint suppression therapy has recently been of great clinical success, especially in tumor patients with a high mutation load. The mutation load of the tumor is high, and more tumor neoantigens are expressed, so that T cells in vivo are easily identified and killed. The quantity and quality of the tumor neoantigen therefore influences the first step of immunotherapy and plays a critical role. In 2013, tumor immunotherapy is judged as the first of ten technological advances by the journal of Science, and scientists including Rosenberg, Schreiber and the like lead the research hot tide of tumor neogenesis antigens. An epoch-making success case was reported by the Rosenberg team in the Science journal in 5 months 2014: the lymphocyte which is amplified in vitro and can specifically recognize abnormal protein caused by cancer cell gene mutation is utilized to successfully treat a patient with extremely malignant late-stage bile duct cancer. At the end of 2016, the Rosenberg team screened TIL cells targeting the tumor neoantigen after mutation of KRAS gene G12D, which expanded back to cause tumor regression, published in the Top-grade journal of medicine "NEJM". In 2017, cathinej.wu and Ugur Sahin were simultaneously published in Nature to report that personalized tumor vaccines based on tumor neoantigens passed early clinical trials. Therefore, the detection of the tumor neoantigen has important significance for immunotherapy.
The prediction process of the tumor neoantigen which is published at present mainly comprises EpiToolkit and Epi-Seq. However, EpiToolKit only starts from mutation, does not consider the depth and coverage of sequencing data, does not consider the quality of mutation from the quality of data, and cannot judge the quality of the obtained neoantigen. In addition, EpiToolkit does not consider expression abundance and does not consider the expression condition of the neoantigen, which causes false positive prediction and can not screen high-quality neoantigen. Many mutations at the DNA level are not expressed, and on average there may be 50% of mutations that are not expressed, and thus may cause false positives for prediction of neoantigens. And the expression of the mutation is high or low, and the higher the expression is, the stronger the immunogenicity is generated on the whole. In addition, EpiToolKit does not consider the comparison between the mutant peptide and the normal peptide, and the high quality neoantigen is generally higher in affinity than the normal peptide, while EpiToolKit lacks such a comparison, which would also cause false positive screening of high quality neoantigen. Epi-Seq predicts tumor-specific antigens only from the expression data of tumors, and predicts neoantigens from the expression data, which also causes false positives. On one hand, false positives are easily caused by the influence of RNA editing; on the other hand, RNA sequencing is performed after reverse transcription of cDNA, and a large false positive is also introduced in the process; on the other hand, there are many false positives in the detection methods for tumor cDNA (tumor cDNA) and germline DNA (germline DNA). The above factors result in more false positives for the Epi-Seq derived neoantigen. Finally, current affinity-based neo-antigen approaches do not take into account whether the mutant peptide fragments are presented on the cell surface, whereas statistically only < 5% of the peptide fragments can actually be presented on the cell surface. Therefore, at present, no method and flow for screening high-quality tumor neoantigens from multiple angles directly from sequencing comparison results exist.
Disclosure of Invention
The invention aims to provide a neoantigen prediction method, a neoantigen prediction device and a storage medium based on next generation sequencing, which can accurately obtain a high-quality neoantigen from sequencing data, facilitate subsequent further experimental verification and immunotherapy, and solve the problem that whether a mutant peptide fragment presents on the surface of a cell cannot be obtained based on an affinity model at present.
According to a first aspect, there is provided in one embodiment a method for neoantigen prediction based on next-generation sequencing, comprising:
acquiring second-generation genome sequencing data of a tumor sample and a normal sample from the same individual and second-generation sequencing data of a tumor transcriptome;
carrying out mutation detection on the second-generation sequencing data of the genome to obtain tumor somatic point mutation and insertion deletion mutation as candidate mutation; performing fusion gene mutation detection on the second-generation sequencing data of the tumor transcriptome to obtain fusion gene mutation as candidate mutation;
detecting the HLA molecule types of the tumor sample and the normal sample to obtain an HLA typing result of the tumor sample matched with the normal sample;
annotating point mutations, indel mutations, and fusion gene mutations among the candidate mutations described above from gene mutation to amino acid mutation;
predicting peptide fragments of point mutation, insertion deletion mutation and fusion gene mutation in the candidate mutation based on the result obtained by the annotation to obtain corresponding mutation prediction peptide fragments;
inputting the mutation prediction peptide fragment and the HLA typing result into a neoantigen prediction model, wherein the neoantigen prediction model is a model which is obtained by using mass spectrum detection data of neoantigens on the surface of tumor cells as positive data and fitting by machine learning; and (4) scoring and sequencing through the neoantigen prediction model to obtain a neoantigen prediction result with scores arranged in sequence from high to low.
In a preferred embodiment, at least two mutation detection methods are used to perform mutation detection on the second-generation sequencing data of the genome, and the intersection detected by the two mutation detection methods is taken as a candidate mutation.
In a preferred embodiment, two HLA molecule type detection methods are adopted to detect the HLA molecule types of the tumor sample and the normal sample, if at least one of the detection methods detects that the HLA molecules of the tumor sample are matched with the normal sample, an HLA typing result is output, otherwise, an empty result is output.
In a preferred embodiment, the prediction of peptide fragments of point mutation, insertion deletion mutation and fusion gene mutation among candidate mutations described above comprises:
taking a peptide segment which contains point mutation amino acids and is between a third number of amino acids in length as a mutation prediction peptide segment of the point mutation;
taking a peptide segment between a third number of amino acids including the amino acid at the position of the insertion deletion mutation as a mutation prediction peptide segment of the insertion deletion mutation;
taking the peptide segment with the length of the third number of amino acids including the amino acid of the fusion site of the fusion gene mutation as the mutation prediction peptide segment of the fusion gene mutation.
In a preferred embodiment, the above-mentioned neo-antigen prediction model is obtained by training as follows:
acquiring newborn antigen peptide segment data which is detected by mass spectrometry and presented on the surface of a tumor cell as positive training data and HLA-I typing information corresponding to the newborn antigen peptide segment, taking peptide segments among a first number of amino acids, and randomly intercepting the peptide segments larger than a second number of amino acids among a third number of amino acids;
acquiring protein sequence data of a non-tumor patient, and randomly intercepting the protein sequence into a peptide segment among a third number of amino acids as negative training data;
performing one-hot encoding on the positive and negative training data to convert the peptide fragments into a first matrix; mapping each element in the first matrix into a vector element, processing the vector element by a full connection layer to obtain a fifth number of vectors, and processing the fifth number of vectors by the full connection layer after carrying out standardization processing on the fifth number of vectors to obtain a fourth number of vectors;
performing one-hot coding on the HLA-I typing information to convert the HLA-I typing information into a second matrix, and compressing the second matrix into vectors with the length of a fourth quantity; and performing dot product on the vectors with the length of the fourth quantity corresponding to the positive training data and the negative training data and the vectors with the length of the fourth quantity corresponding to the HLA-I typing information, and outputting scores to serve as the score of the new antigens.
In a preferred embodiment, said first number is 6-16; the second number is 11; the third number is 9-11; the fourth number is 74; the fifth number is 256.
In a preferred embodiment, the HLA-I typing information includes alleles a-1, a-2, b-1, b-2, c-1 and c-2.
In a preferred embodiment, the score is defined as the probability of a neoantigen, and the loss function for each peptide fragment is: loss (i) ═ log (Bernoulli (yi | pr (peptide pi) responded)); yi is the classification value, negative is 0, positive is 1; pr represents the probability of presentation as a new antigen, and is substituted into the Bernoulli equation for cumulative calculation.
According to a second aspect, an embodiment provides a neoantigen prediction device based on next generation sequencing, comprising:
the data acquisition unit is used for acquiring second-generation genome sequencing data of a tumor sample and a normal sample from the same individual and second-generation sequencing data of a tumor transcriptome;
a mutation detection unit for performing mutation detection on the second-generation genome sequencing data to obtain tumor somatic point mutation and insertion deletion mutation as candidate mutations; performing fusion gene mutation detection on the second-generation sequencing data of the tumor transcriptome to obtain fusion gene mutation as candidate mutation;
the HLA typing unit is used for detecting the HLA molecule types of the tumor sample and the normal sample to obtain an HLA typing result of the matching of the HLA molecules of the tumor sample and the normal sample;
a mutation annotation unit for performing annotation of gene mutation to amino acid mutation for point mutation, insertion deletion mutation and fusion gene mutation among the above candidate mutations;
a mutation peptide fragment prediction unit for predicting the peptide fragments of point mutation, insertion deletion mutation and fusion gene mutation in the candidate mutation based on the result obtained by the annotation to obtain corresponding mutation prediction peptide fragments;
a neoantigen prediction unit for inputting the mutation prediction peptide fragment and the HLA typing result into a neoantigen prediction model, wherein the neoantigen prediction model is a model obtained by machine learning and fitting neoantigen mass spectrum detection data on the surface of a tumor cell as positive data; and (4) scoring and sequencing through the neoantigen prediction model to obtain a neoantigen prediction result with scores arranged in sequence from high to low.
According to a third aspect, an embodiment provides a computer readable storage medium comprising a program executable by a processor to implement the method of the first aspect.
According to the method, mass spectrum detection of the neoantigen on the surface of the tumor cell is used as positive data, a model for judging the neoantigen is fitted by machine learning, the high-quality neoantigen can be accurately obtained from sequencing data, subsequent further experimental verification and immunotherapy are facilitated, and the problem that whether a mutant peptide fragment presents on the surface of the cell cannot be obtained based on an affinity model at present is solved. Compared with the indirect index of affinity, the prediction result is more direct, intermediate complex factors are avoided, the peptide fragment presented on the cell surface is directly used for training, the high-quality neoantigen is obtained, the sensitivity and the specificity are higher, and the subsequent experimental verification or the treatment of tumor vaccines is facilitated.
Drawings
FIG. 1 is a flow chart of a method for neoantigen prediction based on next generation sequencing of neoantigen peptides according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a hierarchy of a prediction model of neoantigens in an embodiment of the present invention;
FIG. 3 is a block diagram of an apparatus for prediction of neoantigens based on next-generation sequencing of neoantigen peptides according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the following detailed description and accompanying drawings. In the following description, numerous details are set forth in order to provide a better understanding of the present application. However, those skilled in the art will readily recognize that some of the features may be omitted or replaced with other elements, materials, methods in different instances.
Furthermore, the features, operations, or characteristics described in the specification may be combined in any suitable manner to form various embodiments. Also, the various steps or actions in the method descriptions may be transposed or transposed in order, as will be apparent to one of ordinary skill in the art. Thus, the various sequences in the specification and drawings are for the purpose of describing certain embodiments only and are not intended to imply a required sequence unless otherwise indicated where such sequence must be followed.
The ordinal numbers used herein to describe a technical feature, such as "first", "second", etc., are used solely to distinguish between the objects described and do not have any sequential or technical meaning.
In view of the problems in the prior art, the present invention aims to provide a method for predicting a neoantigen based on a next-generation sequencing neoantigen peptide, which solves the problem that a high-quality neoantigen cannot be screened from multiple angles at present. The invention uses the mass spectrum detection result of the neoantigen on the surface of the tumor cell as positive data, uses machine learning to fit a model for judging the neoantigen, accurately obtains the high-quality neoantigen from sequencing data, facilitates subsequent further experimental verification and immunotherapy, and solves the problem that whether a mutant peptide section presents on the surface of the cell cannot be obtained based on an affinity model at present.
As shown in fig. 1, in one embodiment of the present invention, a method for prediction of neoantigens based on next-generation sequencing of neoantigen peptides is provided, which comprises the following steps:
s101: data acquisition step
And acquiring second-generation genome sequencing data and second-generation tumor transcriptome sequencing data of a tumor sample and a normal sample from the same individual.
In the embodiment of the invention, the tumor sample and the normal sample of the same tested individual need to be detected simultaneously. The subject may, for example, be an individual who has been clinically diagnosed as a tumor patient. The tumor sample generally refers to a sample derived from the affected part or tissue of a tumor patient, such as a lung tissue sample of a lung cancer patient. The normal sample refers to a normal sample derived from a non-diseased part or tissue of the same tumor patient, for example, a leukocyte sample isolated from peripheral blood.
In the present embodiment, the second-generation genome sequencing data of the tumor sample and the normal sample are generally aligned to the reference genome first. Therefore, in a preferred embodiment, the data acquisition step acquires an alignment file of the genome secondary sequencing data of the tumor sample and the normal sample aligned to the reference genome.
S102: mutation detection step
Carrying out variation detection on the second-generation sequencing data of the genome to obtain tumor somatic point mutation and insertion deletion mutation as candidate mutation; and carrying out fusion gene mutation detection on the second-generation sequencing data of the tumor transcriptome to obtain fusion gene mutation as candidate mutation.
In the embodiment of the invention, various technologies are suitable for performing variation detection on the second-generation sequencing data of the genome. In a preferred embodiment, at least two mutation detection methods are used to perform mutation detection on the second-generation sequencing data of the genome, and the intersection detected by the two mutation detection methods is taken as a candidate mutation. For example, in one embodiment, at least two mutation detection software (Mutec and Varscan) are used to detect the tumor somatic point mutation and the insertion deletion mutation in the alignment file of the sequencing results of the tumor sample and the normal sample, and the intersection detected by the two mutation detection software is used as the candidate mutation. Meanwhile, the detection of the mutation of the fusion gene is carried out on the comparison file of the sequencing result of the tumor transcriptome, and the detected mutation of the fusion gene is also used as a candidate mutation.
S103: HLA typing step
And detecting the HLA molecule types of the tumor sample and the normal sample to obtain an HLA typing result of the matching of the HLA molecules of the tumor sample and the normal sample.
In the embodiments of the present invention, there are various techniques suitable for detecting the types of HLA molecules in tumor samples and normal samples. In a preferred embodiment, two HLA molecule type detection methods are adopted to detect the HLA molecule types of the tumor sample and the normal sample, if at least one of the detection methods detects that the HLA molecules of the tumor sample are matched with the normal sample, an HLA typing result is output, otherwise, an empty result is output. For example, in one embodiment, HLA molecule types of the tumor sample and normal sample are detected by using HLA molecule type detection software, polysolver and BWA mem, respectively, and if the HLA molecule of the tumor sample detected by the polysolver software matches with the HLA molecule of the normal sample, the HLA molecule subtype result is output; if not, checking the matching condition of the HLA molecules of the tumor sample detected by the BWA mem and the normal sample, if matching, outputting the HLA molecule subtype detection result of the BWA mem, and if still not matching, outputting a null result.
S104: mutation annotation procedure
Annotation of gene mutation to amino acid mutation was performed for point mutation, insertion deletion mutation and fusion gene mutation among candidate mutations.
Specifically, the base mutations in the point mutation, the insertion deletion mutation and the fusion gene mutation are converted into the amino acid mutations in the peptide fragment, i.e., a correspondence relationship is established between the base mutation and the amino acid mutation. In the examples of the present invention, there are various techniques suitable for annotation of point mutation, insertion deletion mutation and mutation of fusion gene into amino acid mutation, and such annotation techniques can be realized according to the usual practice in the art.
S105: prediction step of mutant peptide fragment
And based on the results obtained by annotation, predicting the peptide fragments of point mutation, insertion deletion mutation and fusion gene mutation in the candidate mutation to obtain corresponding mutation prediction peptide fragments.
In a preferred embodiment of the present invention, the step of predicting the mutant peptide fragments comprises predicting the peptide fragments of point mutations, indel mutations and fusion gene mutations among the candidate mutations. The method specifically comprises the following steps: taking a peptide segment which contains point mutation amino acids and is between a third number of amino acids in length as a mutation prediction peptide segment of the point mutation; taking a peptide segment between a third number of amino acids including the amino acid at the position of the insertion deletion mutation as a mutation prediction peptide segment of the insertion deletion mutation; taking a peptide segment with the length of the third number of amino acids including the amino acid of the fusion site of the fusion gene mutation as a mutation prediction peptide segment of the fusion gene mutation; the third number is 9-11.
S106: neoantigen prediction procedure
Inputting the mutation prediction peptide fragment and an HLA typing result into a neoantigen prediction model, wherein the neoantigen prediction model is a model obtained by using neoantigen mass spectrum detection data on the surface of a tumor cell as positive data and learning and fitting by using a machine; and (4) scoring and sequencing through a neoantigen prediction model to obtain a neoantigen prediction result with scores arranged in sequence from high to low.
In a preferred embodiment of the present invention, the neoantigen prediction model is trained by the following method:
(a) acquiring the data of the neoantigen peptide segments detected by mass spectrometry and presented on the surface of the tumor cells as positive training data and HLA-I typing information corresponding to the neoantigen peptide segments, taking the peptide segments among the first number of amino acids, and randomly intercepting the peptide segments larger than the second number of amino acids among the third number of amino acids.
In a preferred embodiment of the present invention, the neoantigenic peptide fragment data is from a tumor neoantigenic peptide mass spectrometry database, and the data download website: https:// www.nature.com/articles/nbt.4313# supplementariy-information, the first number is 6-16; the second number is 11; the third number is 9-11. The positive training data and the HLA-I typing information corresponding to the peptide fragment of the tumor neoantigen are obtained from a mass spectrum database of the peptide of the tumor neoantigen, wherein the peptide fragment of 6-16 amino acids is taken, the peptide fragment of more than 11 amino acids is randomly cut into the length of 9-11 amino acids, the length is adopted because the HLA-I can only receive the peptide fragment within the length, and the data are the data of the peptide fragment of the tumor neoantigen detected by mass spectrum and presented on the surface of the tumor cell.
(b) Protein sequence data of non-tumor patients are obtained, and peptide fragments among a third number of amino acids are randomly intercepted from the protein sequence to serve as negative training data.
In a preferred embodiment of the invention, the non-tumor patient protein sequence data is from the SwissProt protein database. For example, using the data in the SwissProt protein database, and using the non-tumor patient protein data in this database, protein sequences were randomly truncated to peptides between 9-11 amino acids in length, and used as a negative data set for training.
(c) One-hot encoding (one-hot encoding) was performed on the positive and negative training data to convert the peptide fragments to the first matrix.
Specifically, one-hot coding is carried out on the negative/positive peptide fragments, and the peptide fragments are converted into a matrix to be input into a model for calculation. For example, in a preferred embodiment of the present invention, one-hot encoding specifically means that there are 21 possibilities per position of the 11-position peptide (20 amino acids plus the complement of X), a 21-length vector is used to represent each position, only the corresponding amino acid position in the vector is 1, and the remainder is 0, and a 11X 21 matrix is used to represent the 11-position peptide.
(d) Mapping each element in the matrix of the positive training data and the negative training data into a vector element, processing the vector elements through a full connection layer to obtain vectors with the length of a fifth quantity, processing the vectors with the length of the fifth quantity through the full connection layer after carrying out standardization processing on the vectors with the length of the fifth quantity, and converting the standardized vectors with the length of the fifth quantity into vectors with the length of a fourth quantity.
In a preferred embodiment of the invention, the fourth number is 74 and the fifth number is 256. In a preferred embodiment of the present invention, alleles a-1, a-2, b-1, b-2, c-1 and c-2 of 6 HLA-I corresponding to each neoantigen peptide fragment are obtained from a neoantigen peptide mass spectrum database. There are 74 possible HLA-I alleles (74 HLA-I genotypes), and the 74 HLA-I genotypes comprise the majority of the human population, and if not within the 74, the following steps are not performed.
(e) Performing one-hot coding on the HLA-I typing information to convert the HLA-I typing information into a second matrix, and compressing the second matrix into vectors with the length of a fourth quantity;
specifically, one-hot codes specifically mean that each of the 6 HLA-I alleles has 74 possibilities (74 HLA-I genotypes), each digit is represented by a 74-length vector, only the corresponding HLA-I genotype in the vector is 1, and the rest is 0; the 6 HLA-I alleles were represented using a 6X 74 matrix and then compressed into a 74 length vector.
(f) And performing dot product on the vectors with the length of the fourth quantity corresponding to the positive training data and the negative training data and the vectors with the length of the fourth quantity corresponding to the HLA-I typing information, and outputting scores to serve as the score of the new antigens.
The above steps (c) to (f) are exemplified below.
As shown in FIG. 2, in a preferred embodiment of the present invention, the model hierarchy includes: (1.1) Input Layer, and if the Input peptide fragment is less than 11 positions, the Input peptide fragment is filled with X. (1.2) Lambda layer, one-hot encoding the peptide fragment input from the previous layer, and outputting a matrix of 11X 21(20 amino acids plus complementary position X). And (1.3) a Flatten layer, which inputs the above 11 x 21 matrix, and maps each element of the matrix into a vector with the length of 231, so as to facilitate the calculation of a subsequent full-connection layer. No parameters in the (1.1) to (1.3) layers of the model hierarchy need to be trained, and only the data are transformed to facilitate the training of the subsequent steps. (1.4) a Dense layer, inputting the above vector with the length of 231, and outputting the vector with the length of 256 through full-connection layer processing containing 256 neurons, wherein the 256 neurons are machine learning empirical values, and the full-connection layer processing refers to: each element in the length 231 vector is multiplied by the weights of 256 neurons and summed to obtain a length 256 vector. (1.5) Batch Normalization, namely inputting the 256-length vector for Normalization to prevent gradient disappearance from accelerating convergence and overfitting in the learning process, and outputting the 256-length vector after Normalization. (1.6) a Dense layer, which inputs the above-normalized vectors with the length of 256 and outputs the vectors with the length of 74 after being processed by a full-junction layer containing 74 neurons, because 74 HLA-I genotypes exist in the neoantigen peptide mass spectrum database, and if the 74 HLA-I genotypes do not exist, the 74 HLA-I genotypes are not considered; here, the full connection layer processing means: each element in the length 256 vector is multiplied by the weight of 74 neurons and summed to yield a length 74 vector. (2.1) Input Layer, inputting the allele information of 6 HLA-I corresponding to each neoantigen peptide fragment. And (2.2) the Embedding layer carries out one-hot coding on the 6 HLA-I typing information input by the previous layer, if the HLA-I typing information which is not in 74 HLA-I genotypes exists, the vectors are all 0, and a 6 multiplied by 74 (74 HLA-I genotypes are contained in a mass spectrum database) matrix is output. And (2.3) inputting the one-hot coding of the previous layer and outputting the compressed vector with the length of 74. And (3.1) inputting the results of the layers (1.6) and (2.3), performing dot product on the vector with the length of 74 corresponding to the positive training data and the vector with the length of 74 corresponding to the HLA-I typing information, outputting a score, and scoring the newborn antigen. In a preferred embodiment of the invention, this score is defined as the probability of a new antigen, and the loss function for each peptide segment is loss (i) log (Bernoulli (yi | pr (peptide pi) resented)). For example, a negative peptide fragment is classified as a, its loss function is-log (1-a), and the model global loss function is the sum of each loss function, so long as the global loss function is as small as possible. Here, a gradient descent method is used, and derivatives are reversely obtained, and some parameters are modified each time until the loss function is not reduced any more, and the training is completed.
Those skilled in the art will appreciate that all or part of the functions of the various methods in the above embodiments may be implemented by hardware, or may be implemented by computer programs. When all or part of the functions of the above embodiments are implemented by a computer program, the program may be stored in a computer-readable storage medium, and the storage medium may include: a read only memory, a random access memory, a magnetic disk, an optical disk, a hard disk, etc., and the program is executed by a computer to realize the above functions. For example, the program may be stored in a memory of the device, and when the program in the memory is executed by the processor, all or part of the functions described above may be implemented. In addition, when all or part of the functions in the above embodiments are implemented by a computer program, the program may be stored in a storage medium such as a server, another computer, a magnetic disk, an optical disk, a flash disk, or a removable hard disk, and may be downloaded or copied to a memory of a local device, or may be version-updated in a system of the local device, and when the program in the memory is executed by a processor, all or part of the functions in the above embodiments may be implemented.
In one embodiment of the present invention, a neoantigen prediction apparatus based on next generation sequencing is provided, as shown in fig. 3, including: the data acquisition unit 301 is configured to acquire second-generation genome sequencing data of a tumor sample and a normal sample from the same individual, and second-generation tumor transcriptome sequencing data; a mutation detection unit 302, configured to perform mutation detection on the second-generation genome sequencing data to obtain tumor somatic point mutation and insertion deletion mutation as candidate mutations; performing fusion gene mutation detection on the second-generation sequencing data of the tumor transcriptome to obtain fusion gene mutation as candidate mutation; an HLA typing unit 303, configured to detect types of HLA molecules of the tumor sample and the normal sample, and obtain an HLA typing result of the tumor sample matching the HLA molecules of the normal sample; a mutation annotation unit 304 for performing annotation of a point mutation, an insertion deletion mutation, and a fusion gene mutation among the above candidate mutations, from a gene mutation to an amino acid mutation; a mutation peptide fragment prediction unit 305 for predicting the peptide fragments of the point mutation, the insertion deletion mutation and the fusion gene mutation among the candidate mutations based on the result obtained by the annotation to obtain corresponding mutation prediction peptide fragments; a neoantigen prediction unit 306 configured to input the mutation prediction peptide fragment and the HLA typing result into a neoantigen prediction model, which is a model obtained by machine learning and fitting using mass spectrometry detection data of a neoantigen on the surface of a tumor cell as positive data; and (4) scoring and sequencing through the neoantigen prediction model to obtain a neoantigen prediction result with scores arranged in sequence from high to low.
In one embodiment of the invention, a computer-readable storage medium is provided that includes a program executable by a processor to perform a method for neoantigen prediction based on next-generation sequencing, comprising: acquiring second-generation genome sequencing data of a tumor sample and a normal sample from the same individual and second-generation sequencing data of a tumor transcriptome; carrying out mutation detection on the second-generation sequencing data of the genome to obtain tumor somatic point mutation and insertion deletion mutation as candidate mutation; performing fusion gene mutation detection on the second-generation sequencing data of the tumor transcriptome to obtain fusion gene mutation as candidate mutation; detecting the HLA molecule types of the tumor sample and the normal sample to obtain an HLA typing result of the tumor sample matched with the normal sample; annotating point mutations, indel mutations, and fusion gene mutations among the candidate mutations described above from gene mutation to amino acid mutation; predicting peptide fragments of point mutation, insertion deletion mutation and fusion gene mutation in the candidate mutation based on the result obtained by the annotation to obtain corresponding mutation prediction peptide fragments; inputting the mutation prediction peptide fragment and the HLA typing result into a neoantigen prediction model, wherein the neoantigen prediction model is a model which is obtained by using mass spectrum detection data of neoantigens on the surface of tumor cells as positive data and fitting by machine learning; and (4) scoring and sequencing through the neoantigen prediction model to obtain a neoantigen prediction result with scores arranged in sequence from high to low.
The technical solutions of the present invention are described in detail by the following specific examples, and it should be understood that the examples are only illustrative and should not be construed as limiting the scope of the present invention.
Example 1: model training
1. Data preparation
Obtaining positive training data from a neoantigen peptide mass spectrum database, taking peptide segments between 6 and 16 amino acids, randomly cutting peptide segments larger than 11 amino acids to be 9 to 11 amino acids in length, and verifying the neoantigen data presented on the surface of the tumor cell by mass spectrometry. Peptides between 9-11 amino acids in length were randomly truncated using the data in the SwissProt protein database and were used as a negative data set for training. And obtaining HLA-I typing information corresponding to the peptide fragment of the neoantigen from the database. And (3) performing one-hot coding on the negative/positive peptide fragments, performing one-hot coding on HLA-I typing information, and inputting the information into a model.
2. Model training module
Model training using the above preparation data, the model comprises the following components: (1.1) Input Layer, and if the Input peptide fragment is less than 11 positions, the Input peptide fragment is filled with X. (1.2) Lambda layer, inputting the peptide segment of the previous layer and outputting a matrix of 11X 21(20 amino acids plus complementary position X) through one-hot coding. And (1.3) a Flatten layer, which inputs the above 11 x 21 matrix, and maps each element of the matrix into a vector with the length of 231, so as to facilitate the calculation of a subsequent full-connection layer. No parameters in the (1.1) to (1.3) layers of the model hierarchy need to be trained, and only the data are transformed to facilitate the training of the subsequent steps. (1.4) a Dense layer, inputting the above vector with the length of 231, and outputting the vector with the length of 256 through processing of a full-connection layer containing 256 neurons, wherein the 256 neurons are machine learning empirical values, and the full-connection layer refers to: each element in the length 231 vector is multiplied by the weights of 256 neurons and summed to obtain a length 256 vector. (1.5) Batch Normalization, namely inputting the 256-length vector for Normalization to prevent gradient disappearance from accelerating convergence and overfitting in the learning process, and outputting the 256-length vector after Normalization. (1.6) a Dense layer, which inputs the above-normalized vectors with the length of 256 and outputs the vectors with the length of 74 after being processed by a full-junction layer containing 74 neurons, because 74 HLA-I genotypes exist in the neoantigen peptide mass spectrum database, and if the 74 HLA-I genotypes do not exist, the 74 HLA-I genotypes are not considered; here, the full connection layer processing means: each element in the length 256 vector is multiplied by the weight of 74 neurons and summed to yield a length 74 vector. (2.1) Input Layer, inputting the allele information of 6 HLA-I corresponding to each neoantigen peptide fragment. And (2.2) the Embedding layer carries out one-hot coding on the 6 HLA-I typing information input by the previous layer, if the HLA-I typing information which is not in the 74 HLA-I genotypes exists, the vectors are all 0, and a 6 multiplied by 74 (74 HLA-I genotypes are contained in the mass spectrum database) matrix is output. And (2.3) a Lambda layer, inputting one-hot coding of the previous layer, and outputting a vector compressed to be 74 in length. And (3.1) inputting the results of the layers (1.6) and (2.3), performing dot product on the vectors with the length of 74 corresponding to the positive training data and the negative training data and the vectors with the length of 74 corresponding to the HLA-I typing information, outputting scores, and scoring the neoantigens. In a preferred embodiment of the invention, this score is defined as the probability of a new antigen, and the loss function for each peptide segment is loss (i) log (Bernoulli (yi | pr (peptide pi) resented)). For example, a negative peptide fragment is classified as a, its loss function is-log (1-a), and the model global loss function is the sum of each loss function, so long as the global loss function is as small as possible. Here, a gradient descent method is used, and derivatives are reversely obtained, and some parameters are modified each time until the loss function is not reduced any more, and the training is completed.
3. Output model
The output model scored the neoantigens for each peptide fragment, in order from high to low.
Example 2: prediction of neoantigens
In this example, samples provided by TESLA (tumor neoantigen screening alliance) were used to experimentally verify that the peptides of the neoantigen were able to bind to the corresponding HLA. The experimental verification principle is based on tetramer technology, and reaction of pMHC (peptide fragment/MHC combination) and T cells is checked to obtain positive/negative peptide fragments.
The 5 samples are numbered 1, 2, 10, 103, and 210, and the specific steps of the sample detection in this embodiment are as follows: obtaining the HLA-I type given by each sample; obtaining all mutant peptide segments of each sample, intercepting the mutant peptide segments containing mutant amino acids to 11 amino acids in length, and if the mutant peptide segments do not contain the mutant amino acids to 11 amino acids, filling the mutant peptide segments with X; inputting a trained neoantigen prediction model; and (4) sorting the output according to the score of the new antigens, and taking the peptide fragment with the score larger than 0.1 as a positive new antigen result.
The control used published software MHCpan, and the mutant peptide fragment of each sample was also imported. And (3) outputting the affinity result of each peptide fragment through a model trained by the existing affinity data set (IEDB), and outputting the positive neoantigen result given by the model under default parameters.
Table 1 shows the results of the test of five patients with TESLA in this example, and the prediction model of neoantigens according to the invention is better than the common software MHCpan. The second column in Table 1 is the number of positive/negative peptide fragments verified in TESLA data.
TABLE 1
Figure BDA0002244271630000111
Figure BDA0002244271630000121
PPV refers to the true positive rate, i.e., the percentage of true positive peptides divided by the number of predicted positive peptides.
Taking patient No. 1 as an example, TESLA verification data indicates that 479 peptide fragments have 2 verified positive peptide fragments, and the neoantigen peptide prediction is carried out on the patient No. 1 by using the neoantigen prediction method of the invention to obtain 20 pre-scored neoantigen peptides, wherein 2 positive peptide fragments verified by TESLA are obtained, and the true positive rate is 10%; however, the true positive rate of MHCpan on the neoantigen peptide is only 7.69%. Similarly, the true positive rates of patients 10 and 103 obtained by the neoantigen prediction method of the invention exceed MHCpan, and the neoantigen prediction method of the invention is superior to MHCpan on the whole. Table 2 shows the predicted results of 5 peptides of patient No. 210 in this example in the neo-antigen prediction method of the present invention. Table 2 lists the positive and negative peptide fragments that have been verified by TESLA, and the predicted result of the method of the present invention is consistent with TESLA, while the predicted result of MHCpan is not completely consistent with TESLA. Further, the peptide fragments verified as negative in line 7 and positive in line 6 in Table 2 can be accurately distinguished by the novel antigen prediction model of the present invention, although their peptide fragments are very close, MHCpan cannot be distinguished on the contrary, and all are classified as negative.
TABLE 2
TESLA The invention relates to a prediction method of neoantigen MHCpan
Sample numbering Peptide fragment Type (B) Prediction Prediction
210 AEYQDMHSY TRUE TRUE TRUE
210 LYTNWYLESLF FALSE FALSE FALSE
210 VEHINISQDW TRUE TRUE FALSE
210 VYCEEYYLF TRUE TRUE FALSE
210 VYCEEYYLFS FALSE FALSE FALSE
Theoretically, skipping the intermediate steps of neoantigen formation directly from the mutant peptide stretch to whether presented to T cells using mass spectral data, the model of neoantigen can be trained more accurately than the existing affinity data set (IEDB) at present. The data from example 2 also show that the method of the invention predicts better effects on neoantigens than the disclosed software MHCpan.
The present invention has been described in terms of specific examples, which are provided to aid understanding of the invention and are not intended to be limiting. For a person skilled in the art to which the invention pertains, several simple deductions, modifications or substitutions may be made according to the idea of the invention.

Claims (10)

1. A method for neoantigen prediction based on next generation sequencing, the method comprising:
acquiring second-generation genome sequencing data of a tumor sample and a normal sample from the same individual and second-generation sequencing data of a tumor transcriptome;
carrying out variation detection on the second-generation genome sequencing data to obtain tumor somatic point mutation and insertion deletion mutation as candidate mutation; performing fusion gene mutation detection on the second-generation sequencing data of the tumor transcriptome to obtain fusion gene mutation as candidate mutation;
detecting the HLA molecule types of the tumor sample and the normal sample to obtain an HLA typing result of the tumor sample matched with the normal sample;
annotating point, indel, and fusion gene mutations in the candidate mutations for gene mutation to amino acid mutation;
predicting peptide fragments of point mutation, insertion deletion mutation and fusion gene mutation in the candidate mutation based on the result obtained by the annotation to obtain corresponding mutation prediction peptide fragments;
inputting the mutation prediction peptide fragment and the HLA typing result into a neoantigen prediction model, wherein the neoantigen prediction model is a model which is obtained by using neoantigen mass spectrum detection data on the surface of a tumor cell as positive data and fitting by machine learning; and (4) scoring and sequencing through the neoantigen prediction model to obtain neoantigen prediction results with scores arranged in sequence from high to low.
2. The method of claim 1, wherein at least two mutation detection methods are used to detect mutations in the second-generation genomic sequencing data, and the intersection of the two mutation detection methods is used as a candidate mutation.
3. The method of claim 1, wherein two HLA molecule type detection methods are used to detect the HLA molecule types of the tumor sample and the normal sample, and wherein if at least one of the HLA molecules of the tumor sample detected by the detection methods matches the normal sample, an HLA typing result is output, and otherwise an empty result is output.
4. The method of claim 1, wherein the predicting of the peptide fragments of the point mutation, the indel mutation and the fusion gene mutation of the candidate mutations comprises:
taking a peptide segment which contains point mutation amino acids and is between a third number of amino acids in length as a mutation prediction peptide segment of the point mutation;
taking a peptide segment between a third number of amino acids including the amino acid at the position of the insertion deletion mutation as a mutation prediction peptide segment of the insertion deletion mutation;
taking the peptide segment with the length of the third number of amino acids including the amino acid of the fusion site of the fusion gene mutation as the mutation prediction peptide segment of the fusion gene mutation.
5. The method of claim 1, wherein the neo-antigen prediction model is trained by:
acquiring newborn antigen peptide segment data which are detected by mass spectrometry and presented on the surface of a tumor cell as positive training data and HLA-I typing information corresponding to the newborn antigen peptide segment, taking peptide segments among a first number of amino acids, and randomly intercepting the peptide segments larger than a second number of amino acids among a third number of amino acids;
acquiring protein sequence data of a non-tumor patient, and randomly intercepting the protein sequence into a peptide segment among a third number of amino acids as negative training data;
performing one-hot encoding on the positive and negative training data to convert the peptide fragments into a first matrix; mapping each element in the first matrix into a vector element, processing the vector element by a full connection layer to obtain a fifth number of vectors, and processing the fifth number of vectors by the full connection layer after carrying out standardization processing on the fifth number of vectors to obtain a fourth number of vectors;
performing one-hot coding on the HLA-I typing information to convert the HLA-I typing information into a second matrix, and compressing the second matrix into vectors with the length of a fourth quantity; and performing dot product on the vectors with the length of the fourth quantity corresponding to the positive training data and the negative training data and the vectors with the length of the fourth quantity corresponding to the HLA-I typing information, and outputting scores to serve as the score of the new antigens.
6. The method of claim 5, wherein the first number is 6-16; the second number is 11; the third number is 9-11; said fourth number is 74; the fifth number is 256.
7. The method of claim 5, wherein the HLA-I typing information comprises alleles a-1, a-2, b-1, b-2, c-1 and c-2.
8. The method of claim 5, wherein the score is defined as the probability of a new antigen, and the loss function for each peptide fragment is: loss (i)=−log(Bernoulli(yi|Pr(peptide piresented)));yiIs the classification value, negative is 0, positive is 1; pr represents the probability of presentation as a new antigen, and is substituted into the Bernoulli equation for cumulative calculation.
9. A neoantigen prediction device based on next generation sequencing, the device comprising:
the data acquisition unit is used for acquiring second-generation genome sequencing data of a tumor sample and a normal sample from the same individual and second-generation sequencing data of a tumor transcriptome;
a mutation detection unit for performing mutation detection on the second-generation genome sequencing data to obtain tumor somatic point mutation and insertion deletion mutation as candidate mutations; performing fusion gene mutation detection on the second-generation sequencing data of the tumor transcriptome to obtain fusion gene mutation as candidate mutation;
the HLA typing unit is used for detecting the HLA molecule types of the tumor sample and the normal sample to obtain an HLA typing result of the matching of the HLA molecules of the tumor sample and the normal sample;
a mutation annotation unit for performing annotation of gene mutation to amino acid mutation for a point mutation, an indel mutation and a fusion gene mutation among the candidate mutations;
a mutation peptide fragment prediction unit for predicting the peptide fragments of point mutation, insertion deletion mutation and fusion gene mutation in the candidate mutation based on the result obtained by the annotation to obtain corresponding mutation prediction peptide fragments;
the neoantigen prediction unit is used for inputting the mutation prediction peptide fragment and the HLA typing result into a neoantigen prediction model, and the neoantigen prediction model is a model which is obtained by using mass spectrum detection data of neoantigens on the surface of tumor cells as positive data and learning and fitting by using a machine; and (4) scoring and sequencing through the neoantigen prediction model to obtain neoantigen prediction results with scores arranged in sequence from high to low.
10. A computer-readable storage medium, characterized by comprising a program executable by a processor to implement the method of any one of claims 1-8.
CN201911011327.8A 2019-10-23 2019-10-23 Method, device and storage medium for predicting neoantigen based on second-generation sequencing Active CN110752041B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911011327.8A CN110752041B (en) 2019-10-23 2019-10-23 Method, device and storage medium for predicting neoantigen based on second-generation sequencing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911011327.8A CN110752041B (en) 2019-10-23 2019-10-23 Method, device and storage medium for predicting neoantigen based on second-generation sequencing

Publications (2)

Publication Number Publication Date
CN110752041A true CN110752041A (en) 2020-02-04
CN110752041B CN110752041B (en) 2023-11-07

Family

ID=69279465

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911011327.8A Active CN110752041B (en) 2019-10-23 2019-10-23 Method, device and storage medium for predicting neoantigen based on second-generation sequencing

Country Status (1)

Country Link
CN (1) CN110752041B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111798919A (en) * 2020-06-24 2020-10-20 上海交通大学 Tumor neoantigen prediction method, prediction device and storage medium
CN112210596A (en) * 2020-09-08 2021-01-12 中生康元生物科技(北京)有限公司 Tumor neoantigen prediction method based on gene fusion event and application thereof
CN112885406A (en) * 2020-04-16 2021-06-01 深圳裕策生物科技有限公司 Method and system for detecting HLA heterozygosity loss
CN113160887A (en) * 2021-04-23 2021-07-23 哈尔滨工业大学 Screening method of tumor neoantigen fused with single cell TCR sequencing data
CN114882951A (en) * 2022-05-27 2022-08-09 深圳裕泰抗原科技有限公司 Method and device for detecting MHC II tumor neoantigen based on next generation sequencing data
CN115424740A (en) * 2022-09-30 2022-12-02 四川大学华西医院 Tumor immunotherapy effect prediction system based on NGS and deep learning
CN116994654A (en) * 2023-09-27 2023-11-03 北京立康生命科技有限公司 Method, apparatus and storage medium for identifying MHC-I/HLA-I binding and TCR recognition peptides

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105524984A (en) * 2014-09-30 2016-04-27 深圳华大基因科技有限公司 Method and equipment for neoantigen epitope prediction
CN108796055A (en) * 2018-06-12 2018-11-13 深圳裕策生物科技有限公司 Tumor neogenetic antigen detection method, device and storage medium based on the sequencing of two generations
CN109682978A (en) * 2017-11-30 2019-04-26 丁平 A kind of Tumor mutations peptide MHC is affine force prediction method and its application
CN110060738A (en) * 2019-04-03 2019-07-26 中国人民解放军军事科学院军事医学研究院 Method and system based on machine learning techniques prediction bacterium protective antigens albumen
CN111465989A (en) * 2017-10-10 2020-07-28 磨石肿瘤生物技术公司 Identification of neoantigens using hot spots

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105524984A (en) * 2014-09-30 2016-04-27 深圳华大基因科技有限公司 Method and equipment for neoantigen epitope prediction
CN111465989A (en) * 2017-10-10 2020-07-28 磨石肿瘤生物技术公司 Identification of neoantigens using hot spots
CN109682978A (en) * 2017-11-30 2019-04-26 丁平 A kind of Tumor mutations peptide MHC is affine force prediction method and its application
CN108796055A (en) * 2018-06-12 2018-11-13 深圳裕策生物科技有限公司 Tumor neogenetic antigen detection method, device and storage medium based on the sequencing of two generations
CN110060738A (en) * 2019-04-03 2019-07-26 中国人民解放军军事科学院军事医学研究院 Method and system based on machine learning techniques prediction bacterium protective antigens albumen

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
NEO君: "如何鉴定一个真的新生抗原", pages 2, Retrieved from the Internet <URL:https://mp.weixin.qq.com/s/HLEU5NXRAT14D6xrZaPsmg> *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112885406A (en) * 2020-04-16 2021-06-01 深圳裕策生物科技有限公司 Method and system for detecting HLA heterozygosity loss
CN111798919A (en) * 2020-06-24 2020-10-20 上海交通大学 Tumor neoantigen prediction method, prediction device and storage medium
CN112210596A (en) * 2020-09-08 2021-01-12 中生康元生物科技(北京)有限公司 Tumor neoantigen prediction method based on gene fusion event and application thereof
CN112210596B (en) * 2020-09-08 2022-04-26 中生康元生物科技(北京)有限公司 Tumor neoantigen prediction method based on gene fusion event and application thereof
CN113160887A (en) * 2021-04-23 2021-07-23 哈尔滨工业大学 Screening method of tumor neoantigen fused with single cell TCR sequencing data
CN113160887B (en) * 2021-04-23 2022-06-14 哈尔滨工业大学 Screening method of tumor neoantigen fused with single cell TCR sequencing data
CN114882951A (en) * 2022-05-27 2022-08-09 深圳裕泰抗原科技有限公司 Method and device for detecting MHC II tumor neoantigen based on next generation sequencing data
CN114882951B (en) * 2022-05-27 2022-12-27 深圳裕泰抗原科技有限公司 Method and device for detecting MHC II tumor neoantigen based on next generation sequencing data
CN115424740A (en) * 2022-09-30 2022-12-02 四川大学华西医院 Tumor immunotherapy effect prediction system based on NGS and deep learning
CN115424740B (en) * 2022-09-30 2023-11-17 四川大学华西医院 Tumor immunotherapy effect prediction system based on NGS and deep learning
CN116994654A (en) * 2023-09-27 2023-11-03 北京立康生命科技有限公司 Method, apparatus and storage medium for identifying MHC-I/HLA-I binding and TCR recognition peptides
CN116994654B (en) * 2023-09-27 2023-12-29 北京立康生命科技有限公司 Method, apparatus and storage medium for identifying MHC-I/HLA-I binding and TCR recognition peptides

Also Published As

Publication number Publication date
CN110752041B (en) 2023-11-07

Similar Documents

Publication Publication Date Title
CN110752041B (en) Method, device and storage medium for predicting neoantigen based on second-generation sequencing
CN108796055B (en) Method, device and storage medium for detecting tumor neoantigen based on second-generation sequencing
DeWitt III et al. Human T cell receptor occurrence patterns encode immune history, genetic background, and receptor specificity
WO2022016125A1 (en) Attention-based neural network to predict peptide binding, presentation, and immunogenicity
CN109584960B (en) Method, device and storage medium for predicting tumor neoantigen
CN108601731A (en) Discriminating, manufacture and the use of neoantigen
CN110799196B (en) Ranking system for immunogenic cancer specific epitope
CN113424264B (en) Cancer mutation selection for generating personalized cancer vaccine
CN110277135B (en) Method and system for selecting individualized tumor neoantigen based on expected curative effect
EP4116436A1 (en) Method and system for screening for neoantigens, and uses thereof
Boegel et al. Bioinformatic methods for cancer neoantigen prediction
CN110706742A (en) Pan-cancer tumor neoantigen high-throughput prediction method and application thereof
CN111755067A (en) Screening method of tumor neoantigen
CN112885406A (en) Method and system for detecting HLA heterozygosity loss
CN112210596B (en) Tumor neoantigen prediction method based on gene fusion event and application thereof
CN115424740B (en) Tumor immunotherapy effect prediction system based on NGS and deep learning
CN116580771A (en) Method and device for predicting tumor neoantigen
Mardis Neoantigen discovery in human cancers
Thrift et al. HLApollo: A superior transformer model for pan-allelic peptide-MHC-I presentation prediction, with diverse negative coverage, deconvolution and protein language features
WO2024032909A1 (en) Methods and systems for cancer-enriched motif discovery from splicing variations in tumours
CN113316818B (en) Method for identifying neoantigen
CN114333998A (en) Tumor neoantigen prediction method and system based on deep learning model
Al Seesi et al. Geneo: a bioinformatics toolbox for genomics-guided neoepitope prediction
James et al. In silico epitope prediction analyses highlight the potential for distracting antigen immunodominance with allogeneic cancer vaccines
RU2809620C2 (en) Selecting cancer mutations to create personalized cancer vaccine

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant