CN115747327A - Novel antigen prediction methods involving frameshift mutations - Google Patents

Novel antigen prediction methods involving frameshift mutations Download PDF

Info

Publication number
CN115747327A
CN115747327A CN202210395664.7A CN202210395664A CN115747327A CN 115747327 A CN115747327 A CN 115747327A CN 202210395664 A CN202210395664 A CN 202210395664A CN 115747327 A CN115747327 A CN 115747327A
Authority
CN
China
Prior art keywords
software
tumor cells
tumor
patient
polypeptide
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210395664.7A
Other languages
Chinese (zh)
Inventor
舒洋
许恒
杨莉
丁振宇
魏于全
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Langgu Biotechnology Co ltd
Original Assignee
Chengdu Langgu Biotechnology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Langgu Biotechnology Co ltd filed Critical Chengdu Langgu Biotechnology Co ltd
Priority to CN202210395664.7A priority Critical patent/CN115747327A/en
Publication of CN115747327A publication Critical patent/CN115747327A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Peptides Or Proteins (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The present invention relates to a prediction technique. The invention aims to solve the defect of low accuracy of the existing tumor neoantigen prediction technology, and provides a neoantigen prediction method related to frameshift mutation, and the technical scheme can be summarized as follows: firstly, comparing sequencing data of tumor cells of a patient with a preset corresponding reference genome, preprocessing the result, analyzing short fragment insertion deletion somatocyte variation, clone types, tumor purity, gene expression quantity in the tumor cells and expression abundance of somatocyte variation alleles by using the comparison result, calculating to obtain polypeptides generated after mutation by using the analysis result, predicting genotypes of MHC class I and MHC class II molecules of the tumor cells of the patient, analyzing the affinity and the polypeptide presentation efficacy of potential new antigens, and finally scoring and sequencing each corresponding new antigen by combining the analysis result and presenting the corresponding new antigens. The method has the beneficial effects of improving the efficiency and being suitable for prediction of new antigens.

Description

Novel antigen prediction methods involving frameshift mutations
Technical Field
The invention relates to a prediction technology, in particular to a prediction technology of individualized tumor neoantigens.
Background
Cancer is thought to result from the accumulation of a series of somatic mutations. In which nonsynonymous mutations in the protein coding region result in the production of mutant peptides and are presented to the surface of cancer cells by the Major Histocompatibility Complex (MHC, also known as leukocyte antigen in humans) according to the endogenous antigen processing pathway. When exposed to a humoral environment, these mutant peptide-MHC complexes are perceived by the host immune system and recognized by T Cell surface receptors (TCRs), which in turn trigger T Cell-mediated cellular immunity to specifically kill cancer cells. Mutant peptides that elicit cellular immunity, also known as tumor neoantigens, are important pathways for driving anti-tumor immune responses.
Tumor neoantigens have long been considered as ideal targets for anti-tumor immunotherapy. The tumor neoantigen is only expressed in tumor cells and is not tolerated by central or peripheral immunity, so the personalized immunotherapy has the characteristics of high specificity, strong immunogenicity and high safety, and has great treatment potential. Immunotherapy targeting tumor neoantigens mainly includes neoantigen polypeptide vaccines, neoantigen RNA vaccines and adoptive T cell therapy (adoptive T cell therapy). In clinical studies, tumor neoantigen vaccines have been demonstrated in melanoma and glioblastoma to induce the body to produce T cells specific for the neoantigen and to effectively prevent the recurrence of melanoma. Adoptive T cell therapy has been shown to exert anti-tumor effects in a variety of malignancies.
Tumor neoantigens may be derived from a variety of somatic mutation types. The most common and most widely used are the non-synonymous single-nucleotide variations (SNVs), i.e. single base substitutions on the DNA sequence and amino acid products translated at this site are made different from the wild type. However, neoantigens derived from non-synonymous SNVs differ from wild-type by only a single amino acid, so only a few of these candidate tumor neoantigens are immunogenic. In addition to SNVs derived tumor neoantigens, other genomic variants have recently been discovered that result in highly specific tumor neoantigens, including neoantigens derived from insertions or deletions (indels), variable splicing, and gene fusions. Compared with SNVs, the mutation types have larger changes on the polypeptide sequence and are easier to generate tumor neoantigens with immunogenicity. Frame shift mutation (Frameshift mutation) is a mutation that results in an Open Reading Frame (ORF) change upon translation due to a non-triplex base of InDel. In most cases, ORF alterations translate to entirely new amino acid sequences. Frame shift mutated InDel can produce novel polypeptide products which have no wild type or normal equivalent and are a highly immunogenic antigen pool, thus leading to an increase in the number of novel antigens and the effector T cells specifically recognized by the novel antigens. There is literature that frameshift mutation-derived tumor neoantigens account for 9 times as many as SNVs.
With the wide application of second generation sequencing in clinical tumor samples, researchers can quickly obtain genome and transcriptome high-throughput sequencing data of patients. These data can be used to identify neoantigens resulting from specific changes in the tumor genome, and to predict the anti-tumor immune response caused by these neoantigens, contributing to the development of personalized immunotherapy. The key link is to effectively predict the tumor neoantigen through the sequencing data. The field of new antigen prediction is rapidly developing and related prediction schemes and algorithms are being developed or improved, but still face significant challenges. At present, the best practice process for predicting new antigens mainly comprises the following steps: 1) Carrying out whole exome sequencing on the tumor and the matched normal DNA sample through a high-pass sequencing platform, and carrying out whole exome sequencing on the RNA sample of the tumor; 2) Quality control and filtering of sequencing data; 3) Aligning to a reference genome; 4) Base quality recalibration and insertion/deletion local region realignment; 5) Detecting somatic mutation; 6) Presumptive a mutant peptide resulting from somatic mutation; 7) HLA allele typing; 8) Assessing binding and presentation of HLA alleles to mutant peptides; 9) Prioritization and selection of candidate neoantigens. Among the key steps are the identification of HLA allelic typing and the prediction of the binding of mutant peptides to HLA molecules. Whether a new antigen can be presented to the cell surface and recognized by T cells depends primarily on the HLA allele type of the patient. CD8+ T cells recognize a class of HLA molecules that consist of conserved β 2-microglobulin and an α chain, which is highly polymorphic and encoded by three loci on human chromosome 6: HLA-A, HLA-B and HLA-C. Similarly, CD4+ T cells recognize HLA class ii molecules, mainly comprising DRB, DPB and DPA genes. Since HLA genes are highly polymorphic, identifying HLA allelic typing is a complex task. Most predictive methods are based on DNA sequencing data. Of these, optitype and Polysolver only recognize class I HLA alleles, and other methods such as arcasahla, seq2HLA, xHLA and Athlates can identify both HLA class I and class ii genotypes.
The most critical step of the whole tumor neoantigen prediction process is to predict the interaction between the mutant peptide and HLA allele, and to identify potential tumor neoantigens from the mutant peptide according to the result. Researchers have developed a variety of computational methods to predict the interaction of mutant peptides with HLA alleles. Most current methods rely primarily on machine learning algorithms, including Linear Regression (LR) and Artificial Neural Networks (ANN), to build predictive models by training on large-scale experimental datasets of HLA-binding peptides. Among them, netMHCpan, which predicts the affinity of a peptide to a selected HLA allele by a combination of several ANN, is currently the most widely used software. The accuracy and precision of prediction of type II tumor neoantigens is low compared to the prediction of HLA-type I neoantigens, mainly due to the "openness" of the peptide binding groove of HLA class II, increasing the amount of data required for machine learning based accurate model training.
The existing novel antigen prediction technology mainly has the following problems:
1. the current tumor neoantigen prediction is mainly based on the analysis of single nucleotide site variation, but the polypeptide generated by the change of the single nucleotide site only has the change of a single amino acid, so that the capability of improving the antigenicity of the polypeptide is limited, and especially for patients with less single nucleotide site mutation, enough number of neoantigens are often not obtained, so that the patients cannot be treated by the tumor neoantigen;
2. the lack of consideration, which can cause the excessive amount of new antigens for screening, also causes some tumor new antigen polypeptides predicted to have good antigenicity not to be really expressed by the tumor cells;
3. current prediction software is based on exome sequencing or transcriptome sequencing and does not consider both DNA and RNA in tumor cells. If neoantigen prediction is performed solely by DNA sequencing data, the amount of antigen is overestimated, especially in genes that are not transcribed or are transcribed in a particularly low proportion. In turn, neoantigen prediction is performed only by RNA sequencing data, taking into account the problem of transcriptome sequencing alignment (if the sequenced fragment crosses an exon, the sequenced fragment will be split into short fragments, thus greatly increasing the error rate of the variations found.
Disclosure of Invention
The invention aims to overcome the defect of low accuracy of the conventional tumor neoantigen prediction technology and provides a neoantigen prediction method related to frameshift mutation.
The invention solves the technical problem and adopts the technical scheme that the method for predicting the new antigen of the frameshift mutation comprises the following steps:
step 1, comparing sequencing data of tumor cells of a patient with a preset corresponding reference genome to obtain a DNA comparison result and an RNA comparison result;
step 2, preprocessing a DNA comparison result and an RNA comparison result;
step 3, analyzing the short fragment insertion deletion somatocyte variation, the clone type, the tumor purity, the gene expression quantity in the tumor cell and the expression abundance of somatocyte variation allele by using the DNA comparison result and the RNA comparison result to obtain an analysis result;
step 4, inserting the short fragment in the analysis result into the deletion somatic cell mutation, and calculating to obtain the polypeptide generated after mutation so as to obtain a DNA analysis result;
step 5, predicting the genotype of MHC class I molecules and the genotype of MHC class II molecules of the tumor cells of the patient, and analyzing the affinity and the polypeptide presentation effect of the potential new antigen according to the DNA analysis result and the genotype of the MHC class I molecules and the genotype of the MHC class II molecules of the tumor cells of the patient;
and 6, scoring and sequencing each corresponding new antigen according to the analysis result, the affinity of the potential new antigen and the presentation efficacy of the polypeptide, and presenting.
Specifically, in the step 1, sequencing data of tumor cells of a patient are compared with a preset corresponding reference genome, and a Burrows-Wheeler transformation algorithm is adopted to obtain a DNA comparison result; RNA alignment results were obtained using the STAR algorithm.
Further, before step 1, the method further comprises the following steps:
and 0, preprocessing the original sequencing data of the tumor cells of the patient to obtain the sequencing data of the tumor cells of the patient.
Specifically, in step 0, the preprocessing includes trimming a sequencing linker sequence, trimming a sequencing tag sequence, and removing a sequence with poor sequencing quality, which may be performed by using trimmatic software.
Still further, in step 2, the preprocessing includes removing duplication, adding group information and base mass recalculation, which can be performed by GATK software.
Specifically, the step 3 comprises the following steps:
step 3A, analyzing somatic cell short fragment insertion deletion sites in tumor cells by using a DNA comparison result;
step 3B, filtering false positive sites according to the analysis result of the somatic cell short fragment insertion deletion sites to obtain actual somatic cell short fragment insertion deletion sites;
step 3C, annotating and further filtering the actual somatic cell short segment insertion deletion sites to obtain short segment insertion deletion somatic variation;
step 3D, calculating somatic copy number variation of the tumor;
step 3E, calculating the purity and the clone structure of the tumor according to the somatic cell mutation caused by the insertion and deletion of the short segments and the somatic cell copy number mutation of the tumor;
step 3F, analyzing and calculating the gene expression quantity in the tumor cells according to the RNA comparison result;
and 3G, analyzing according to the comparison result of the actual somatic cell short fragment insertion deletion site and the RNA to obtain the expression abundance of the somatic cell variation allele.
Further, in step 3A, the analysis of somatic short segment indel sites in tumor cells by using DNA comparison results refers to: somatic short segment indel sites in tumor cells were analyzed using DNA comparison results using Mutect2 and/or Strelka2 and/or Varscan2 and/or Vardict and/or Lofreq and/or Scalpel software, respectively.
Specifically, in the step 3B, the filtering of the false positive sites according to the analysis result of the somatic cell short fragment insertion deletion site specifically comprises: and (4) performing data integration and false positive site filtration on the analysis result in the step 3A through SomaticSeq software.
Still further, step 3C employs ANNOVAR software for annotation; step 3D, calculating by adopting cnvkit software; in step 3E, calculating the tumor purity and the clone structure by ABSOULTE software; the cloning structure refers to the cloning ratio of a mutant gene, namely the cloning ratio of each somatic mutation site; in step 3F, analyzing and calculating the gene expression level in the tumor cells by using RSEM software, wherein the analysis of the gene expression level refers to analyzing and obtaining the number of transcript sequencing Fragments (FPKM) of each thousand basic groups in each Million sequencing Fragments of each gene by using RSEM software; in step 3G, analysis was performed using bcftools software.
Specifically, the step 4 comprises the following steps:
step 4A, filtering the actual somatic cell short fragment insertion deletion sites, and filtering short fragment insertion deletion mutations which can generate nonsense-mediated mRNA degradation;
and 4B, extracting sequences of 15 amino acids before and after the mutation site according to the annotated actual somatic cell short fragment insertion deletion site to obtain an amino acid sequence of the mutant polypeptide, and combining all the sequences into a fasta file of the polypeptide sequence, namely the short fragment insertion deletion mutant protein, as the polypeptide generated after mutation.
Further, in the step 4A, masonmd software is adopted for filtering; in step 4B, customProtDB software is used for processing.
Still further, step 5 comprises the steps of:
step 5A, analyzing the genotype of MHC class I molecules and the genotype of MHC class II molecules of tumor cells of a patient, and selecting the HLA genotype with the highest consistency according to an analysis result to obtain the genotype of the MHC molecules of the patient;
step 5B, predicting the affinity of the short-fragment insertion deletion mutant protein and the MHC molecules of the patient;
and step 5C, filtering the affinity according to the predicted affinity to select corresponding polypeptides of which the affinity is lower than a certain threshold, analyzing the polypeptides and the genotypes of the corresponding MHC molecules of the patient, and calculating to obtain the presentation efficacy of the polypeptides.
Specifically, in step 5A, analyzing the genotype of MHC class I molecules of tumor cells of the patient by adopting polysolver software and/or HLA-VBSeq software and/or OptiType software and/or HLA-HD software and/or xHLA software; analyzing the genotype of MHC class II molecules of tumor cells of a patient by adopting HLA-VBseq software and/or HLA-HD software and/or xHLA software; step 5B, MHC class I epitomes predictors and/or MHC class II epitomes predictors and/or MHCflurry and APPM software provided by IEDB are adopted for prediction; in step 5C, the analytical calculation was performed using netCTLpan software.
Still further, in step 6, the method for scoring each corresponding neoantigen according to the analysis result, the affinity of the potential neoantigen, and the polypeptide presentation potency comprises:
score=abundance*dissimilarity*clonality
wherein the content of the first and second substances,
abundance=L(IC m )*Freq*tanh(TPM);
dissimilarity=(1–L(IC w /50000))/2;
clonality=NCTL*MC;
L(x)=1/(1+e x ) Tanh () is hyperbolic tangent function calculation, score is fraction obtained by new antigen calculation, abundance is abundance of mutant polypeptide, dissimilarity is dissimilarity for measuring difference of affinity of mutant peptide segment and corresponding normal peptide segment, clonality for measuring efficiency of successful presentation of new antigen, TPM is gene expression abundance, freq is proportion of expression abundance of somatic variation allele, the proportion of expression abundance of somatic variation allele is ratio of expression abundance of somatic variation allele to expression abundance of allele, and IC m For mutant polypeptide affinities, IC, obtained during affinity prediction w The wild polypeptide binding affinity obtained in the affinity prediction is shown, NCTL is the polypeptide presentation efficiency, and MC is the mutant gene cloning ratio.
The invention has the beneficial effects that: in the scheme of the invention, by adopting the new antigen prediction method related to frameshift mutation, short segment insertion deletion is used as a source of new antigen, so that the effective quantity of the new antigen is increased, and nonsense mediated mRNA degradation is considered when whether the short segment insertion deletion can become potential new antigen is predicted, so that on one hand, the total quantity of alternative new antigens can be effectively reduced, the prediction efficiency is improved, and on the other hand, the selection of new antigen without protein expression can be avoided. The invention further organically integrates a plurality of steps from the comparison of original data to the sequencing of potential new antigens, can analyze and obtain all potential new antigen information of a patient through a simple computer command, and provides polypeptide sequence information which can be clinically used for the immunotherapy of the new antigens; although the data are processed and analyzed based on the existing open source software method, the normalized data analysis process is relatively independent relative to software packages, needs to be integrated after respective operation, and has complex dependency relationship among the software. The running environment of all software can be constructed on a portable virtual machine platform, so that the analysis can be deployed on various operating systems, and the effect of installation and use is achieved.
Drawings
FIG. 1 is a flow chart of the method of the present invention for predicting novel antigens involving frameshift mutations.
FIG. 2 is a diagram illustrating ELISPOT results in an embodiment of the present invention.
Detailed Description
The technical solution of the present invention is described in detail below with reference to examples.
The invention relates to a new antigen prediction method of frame shift mutation, which comprises the following steps:
step 1, comparing sequencing data of tumor cells of a patient with a preset corresponding reference genome to obtain a DNA comparison result and an RNA comparison result;
step 2, preprocessing a DNA comparison result and an RNA comparison result;
step 3, analyzing the short fragment insertion deletion somatocyte variation, the clone type, the tumor purity, the gene expression quantity in the tumor cell and the expression abundance of somatocyte variation allele by using the DNA comparison result and the RNA comparison result to obtain an analysis result;
step 4, inserting the short fragment in the analysis result into the deletion somatic cell mutation, and calculating to obtain the polypeptide generated after mutation so as to obtain a DNA analysis result;
step 5, predicting the genotype of MHC class I molecules and the genotype of MHC class II molecules of the tumor cells of the patient, and analyzing the affinity and the polypeptide presentation effect of the potential new antigen according to the DNA analysis result and the genotype of the MHC class I molecules and the genotype of the MHC class II molecules of the tumor cells of the patient;
and 6, scoring and sequencing each corresponding new antigen according to the analysis result, the affinity of the potential new antigen and the presentation efficacy of the polypeptide, and presenting.
In order to provide a method for obtaining a DNA comparison result and an RNA comparison result, in step 1, the sequencing data of the tumor cells of the patient are compared with a preset corresponding reference genome, preferably, a Burrows-Wheeler transformation algorithm is adopted to obtain the DNA comparison result; preferably, the STAR algorithm is used to obtain the RNA alignment results.
Since the sequencing of the tumor cells of the patient results in raw sequencing data that is not convenient for direct use, the following steps may be included before step 1:
and 0, preprocessing the original sequencing data of the tumor cells of the patient to obtain the sequencing data of the tumor cells of the patient.
Here, the preprocessing includes trimming a sequencing linker sequence, trimming a sequencing tag sequence, removing a sequence with poor sequencing quality, and the like, which can be performed using trimmatic software.
In order to explain the pretreatment of the DNA comparison result and the RNA comparison result, in step 2, the pretreatment comprises the steps of removing the duplication, adding group information, calculating the base mass weight and the like, and the pretreatment can be carried out by adopting GATK software.
To refine step 3, step 3 may then comprise the following steps:
step 3A, analyzing somatic cell short fragment insertion deletion sites in tumor cells by using a DNA comparison result;
step 3B, filtering false positive sites according to the analysis result of the somatic cell short fragment insertion deletion sites to obtain actual somatic cell short fragment insertion deletion sites;
step 3C, annotating and further filtering the actual somatic cell short segment insertion deletion site to obtain short segment insertion deletion somatic cell variation;
step 3D, calculating somatic copy number variation of the tumor;
step 3E, calculating the tumor purity and the clone structure according to the short fragment insertion deletion somatic cell variation and the somatic cell copy number variation of the tumor;
step 3F, analyzing and calculating the gene expression quantity in the tumor cells according to the RNA comparison result;
and 3G, analyzing according to the comparison result of the actual somatic cell short fragment insertion deletion site and the RNA to obtain the expression abundance of the somatic cell variation allele.
In order to provide an analysis software and a method for analyzing the short fragment insertion/deletion site of the somatic cell in the tumor cell by using the DNA comparison result, the step 3A of analyzing the short fragment insertion/deletion site of the somatic cell in the tumor cell by using the DNA comparison result may specifically be: somatic short segment indel sites in tumor cells were analyzed using DNA comparison results using Mutect2 and/or Strelka2 and/or Varscan2 and/or Vardict and/or Lofreq and/or Scalpel software, respectively. It is preferable to perform the analysis separately using all of the above-mentioned software, and if the analysis is performed using only a part of the software or a single software, although the result can be obtained, the accuracy is relatively lowered.
In order to provide software for filtering false positive sites according to the analysis result of the somatic cell short fragment insertion deletion site, in step 3B, the filtering false positive sites according to the analysis result of the somatic cell short fragment insertion deletion site may specifically be: and (4) performing data integration and false positive site filtration on the analysis result in the step 3A by using SomaticSeq software.
In order to provide a software used in step 3C, step 3D, step 3E, step 3F and step 3G, step 3C is preferably annotated with ANNOVAR software; step 3D is preferably calculated by using cnvkit software; in step 3E, tumor purity and clone structure are preferably calculated using ABSOULTE software; the cloning structure refers to the cloning ratio of a mutant gene, namely the cloning ratio of each somatic mutation site; in step 3F, preferably, the gene expression level in the tumor cell is calculated by using RSEM software analysis, and the analysis of the gene expression level refers to obtaining a transcript sequencing fragment number (FPKM) value of each Kilobase Per Million sequencing Fragments of each gene by using RSEM software analysis; in step 3G, analysis is preferably performed using bcftools software.
To refine step 4, step 4 may then include the following steps:
step 4A, filtering the actual somatic cell short fragment insertion deletion sites, and filtering short fragment insertion deletion mutations which can generate nonsense-mediated mRNA degradation;
and 4B, extracting sequences of 15 amino acids before and after the mutation site according to the annotated actual somatic cell short fragment insertion deletion site to obtain an amino acid sequence of the mutant polypeptide, and combining all the sequences into a fasta file of the polypeptide sequence, namely the short fragment insertion deletion mutant protein, as the polypeptide generated after mutation.
Wherein, in order to provide the software used in the steps 4A and 4B, the filtering is preferably performed by masonmd software in the step 4A; in step 4B, the processing is preferably performed using CustomProtDB software.
To refine step 5, step 5 may then comprise the steps of:
step 5A, analyzing the genotype of MHC class I molecules and the genotype of MHC class II molecules of tumor cells of a patient, and selecting the HLA genotype with the highest consistency according to an analysis result to obtain the genotype of the MHC molecules of the patient;
step 5B, predicting the affinity of the short fragment insertion deletion mutant protein and the MHC molecules of the patient;
and step 5C, filtering the affinity according to the predicted affinity to select corresponding polypeptides of which the affinity is lower than a certain threshold, analyzing the polypeptides and the genotypes of the corresponding MHC molecules of the patient, and calculating to obtain the presentation efficacy of the polypeptides.
In order to provide software used in the steps 5A, 5B and 5C, in the step 5A, polysolver software and/or HLA-VBSeq software and/or OptiType software and/or HLA-HD software and/or xHLA software is used to analyze the genotype of MHC class I molecules of tumor cells of a patient, wherein preferably all the software are used to perform analysis respectively, and then the genotypes predicted by the largest number of software are used simultaneously, so that the prediction accuracy can be improved, and if only part or a single software is used, the prediction accuracy is reduced; HLA-VBSeq software and/or HLA-HD software and/or xHLA software are adopted to analyze the genotype of MHC class II molecules of tumor cells of a patient, wherein preferably all the software is adopted to carry out analysis respectively, and then the genotypes which are predicted by the most software are adopted at the same time, so that the prediction accuracy can be improved, and if only part of or single software is adopted, the prediction accuracy can be reduced; step 5B, MHC class I epitomes predictors and/or MHC class II epitomes predictors and/or MHCflurry and APPM software provided by IEDB are adopted for prediction, wherein preferably all the software is adopted for prediction respectively, and all results are part of final result data; in step 5C, the analytical calculation was performed using netCTLpan software.
In order to provide a method for scoring each corresponding neo-antigen according to the analysis result, the affinity of the potential neo-antigen and the polypeptide presentation efficacy, in step 6, the method for scoring each corresponding neo-antigen according to the analysis result, the affinity of the potential neo-antigen and the polypeptide presentation efficacy may be:
score=abundance*dissimilarity*clonality
wherein, the first and the second end of the pipe are connected with each other,
abundance=L(IC m )*Freq*tanh(TPM);
dissimilarity=(1–L(IC w /50000))/2;
clonality=NCTL*MC;
L(x)=1/(1+e x ) Tanh () is hyperbolic tangent function calculation, score is fraction obtained by new antigen calculation, abundance is abundance of mutant polypeptide, dissimilarity is dissimilarity for measuring difference of affinity of mutant peptide segment and corresponding normal peptide segment, clonality for measuring efficiency of successful presentation of new antigen, TPM is gene expression abundance, and Freq is ratio of expression abundance of somatomedin variant alleleFor example, the ratio of the expression abundance of the somatic variation allele refers to the ratio of the expression abundance of the somatic variation allele to the expression abundance of the allele, IC m For mutant polypeptide affinities, IC, obtained during affinity prediction w The wild polypeptide binding affinity obtained in the affinity prediction, the NCTL polypeptide presentation efficiency and the MC mutant gene cloning ratio.
Example 1
The method is used for predicting the new antigen of the patient, and the specific process is as follows:
comparing the original sequencing data Fastq of the normal tissue and the tumor tissue by using BWA-MEM (namely a Burrows-Wheeler transformation algorithm) to obtain an original BAM file, then using GATK software to perform de-duplication on the BAM file, and recalculating the comparison quality of bases to obtain a final BAM file of tumor DNA and normal DNA; analyzing somatic short fragment insertion-deletion sites in tumor tissues by using Strelka2, mutect2, varScan2, lofreq, vardict and Scalepel software based on BAM files of tumor DNA and normal DNA, integrating the results of 8 pieces of software by using SomaticSeq software, and obtaining actual somatic short fragment insertion-deletion sites by using machine learning; performing functional annotation on the actual somatic cell short fragment insertion and deletion site by using ANNOVAR software to obtain short fragment insertion and deletion somatic mutation, and analyzing by using CustomProtDB software to obtain a fasta file containing mutant amino acid; meanwhile, analyzing the genotype of MHC-I molecules by using a Fastq file and a BAM file of a normal tissue and using Polysolver, optiType, xHLA, HLA-VBSeq and HLA-HD software, analyzing the genotype of MHC-II molecules by using HLA-VBSeq, HLA-HD and xHLA, integrating the genotypes of the MHC-I molecules and the MHC-II molecules obtained by different software, and selecting the genotype with the highest consistency. The affinity of the mutant polypeptide to the MHC molecule was predicted using MHC class I epitopes predictors, MHC class II epitopes predictors, MHCflurry software and APPM software, respectively, in combination with the previously obtained mutant polypeptide fasta. On the other hand, based on BAM files of tumor DNA and normal DNA, copy number variation in tumor tissues is analyzed by using cnvkit software, and purity and clone structure of tumor tissues are analyzed by using ABSOLUTE software based on copy number and previously obtained short fragment insertion deletion somatic variation, and mutation frequency values of corrected DNA mutation sites are obtained at the same time. For high throughput sequencing of tumor tissue RNA was aligned using STAR software and BAM files obtained from the alignment were de-duplicated using GATK. Based on the de-duplicated transcriptome BAM file, the expression level of the gene was then analyzed using RSEM, while analyzing the expression amount of the mutant allele at the actual somatic short-fragment insertion deletion site. And (3) acquiring the fused polypeptide sequence by using the custompurotDB software. And finally, integrating the affinity data of the antigen peptide, the DNA locus variation frequency data, the variation locus mutation allele expression quantity and the gene expression quantity data, sequencing the potential new antigens, and selecting the polypeptide with the top rank as the new antigen.
Three polypeptides from frame shift mutation sources were selected for analysis below, and the polypeptide information is shown in table 1:
TABLE 1 frameshift mutation derived polypeptide information
Figure RE-GDA0003698253110000101
Enzyme-linked immunosorbent assay (ELISPOT method) was used to verify the success of the prediction.
The experimental procedure was as follows:
1) Activating the pre-coated plate, adding 200 mu L of AIM-V serum-free culture medium, standing at room temperature for 10 minutes, and pouring;
2) Adding T cells and polypeptides according to the designed groups, wherein the concentration is 10-50 mu g/mL, and each group has 3 multiple holes;
3) After all the samples have been added, the plates are covered, marked, placed at 37 ℃ and 5% CO 2 Culturing in an incubator for 20 hours;
4) Pouring cells and culture medium in the holes;
5) Cell lysis: add 200. Mu.L of ice-cold deionized water to each well and ice-wash in a refrigerator at 4 ℃ for 10 minutes (lysis of cells by hypotonic method);
6) Washing the plate: washing each well with 260. Mu.L of 1 × Washing buffer for 6 times, each time for 60 seconds, and drying on absorbent paper after each Washing;
7) And (3) incubation of the detection antibody: adding 100 μ L of diluted biotin-labeled antibody per well, 37 deg.C, 5% 2 Incubating for 1 hour in an incubator;
8) Washing the plate: washing each well with 260. Mu.L of 1 × Washing buffer for 6 times, each time for 60 seconds, and drying on absorbent paper after each Washing;
9) And (3) avidin incubation: adding 100. Mu.L of diluted enzyme-labeled avidin to each well, at 37 ℃ and 5% CO 2 Incubating for 1 hour in an incubator;
10 Washing plates: washing each well with 260 μ L of 1 × Washing buffer for 6 times, each for 60 seconds, and drying on absorbent paper after each Washing;
11 Color development): preparing AEC color developing solution according to reagent preparation, adding 100 μ L of color developing solution per well, 37 deg.C, 5% 2 Developing color in an incubator, and checking once every 5 minutes;
12 After the spots grow to a proper size, washing the spots with deionized water for 2 times, and terminating the color development process; reversely buckling the plate on absorbent paper, patting to dry fine water drops, then taking down the protective layer, placing the protective layer in a ventilated place, standing at room temperature, and naturally airing the membrane;
13 ELISPOT plate spot counts and various parameters of spots were recorded for analysis.
ELISPOT results referring to FIG. 2, it can be seen that the polypeptide derived from a frameshift-based mutation is effective in activating T cells in patients, and the above prediction method substantially corresponds to the experimental results.

Claims (14)

1. A novel antigen prediction method involving frameshift mutations, comprising the steps of:
step 1, comparing sequencing data of tumor cells of a patient with a preset corresponding reference genome to obtain a DNA comparison result and an RNA comparison result;
step 2, preprocessing a DNA comparison result and an RNA comparison result;
step 3, analyzing the short fragment insertion deletion somatocyte variation, the clone type, the tumor purity, the gene expression quantity in the tumor cell and the expression abundance of somatocyte variation allele by using the DNA comparison result and the RNA comparison result to obtain an analysis result;
step 4, inserting the short fragment in the analysis result into the deletion somatic cell mutation, and calculating to obtain the polypeptide generated after mutation so as to obtain a DNA analysis result;
step 5, predicting the genotype of MHC class I molecules and the genotype of MHC class II molecules of the tumor cells of the patient, and analyzing the affinity and the polypeptide presentation effect of the potential new antigen according to the DNA analysis result and the genotype of the MHC class I molecules and the genotype of the MHC class II molecules of the tumor cells of the patient;
and 6, scoring and sequencing each corresponding new antigen according to the analysis result, the affinity of the potential new antigen and the presentation efficacy of the polypeptide, and presenting.
2. The method for predicting the neoantigen related to frameshift mutation according to claim 1, wherein in the step 1, the sequencing data of the tumor cells of the patient are compared with the preset corresponding reference genome, and a Burrows-Wheeler transformation algorithm is adopted to obtain a DNA comparison result; RNA alignment results were obtained using the STAR algorithm.
3. The method of predicting a neoantigen involving a frameshift mutation of claim 1, further comprising, before step 1, the steps of:
and 0, preprocessing the original sequencing data of the tumor cells of the patient to obtain the sequencing data of the tumor cells of the patient.
4. The method of claim 3, wherein in step 0, the pre-treatment comprises trimming sequencing linker sequences, trimming sequencing tag sequences and removing sequences with poor sequencing quality, and is performed by using Trimmomatic software.
5. The method for predicting neoantigens involved in frameshift mutations according to claim 1, wherein the pretreatment in step 2 comprises deduplication, group information addition and base mass recalculation, and is performed by using GATK software.
6. The method of predicting a neoantigen involving a frameshift mutation of claim 1, wherein the step 3 comprises the steps of:
step 3A, analyzing somatic cell short fragment insertion deletion sites in tumor cells by using a DNA comparison result;
step 3B, filtering false positive sites according to the analysis result of the somatic cell short fragment insertion deletion sites to obtain actual somatic cell short fragment insertion deletion sites;
step 3C, annotating and further filtering the actual somatic cell short segment insertion deletion sites to obtain short segment insertion deletion somatic variation;
step 3D, calculating somatic copy number variation of the tumor;
step 3E, calculating the tumor purity and the clone structure according to the short fragment insertion deletion somatic cell variation and the somatic cell copy number variation of the tumor;
step 3F, analyzing and calculating the gene expression quantity in the tumor cells according to the RNA comparison result;
and 3G, analyzing according to the comparison result of the actual somatic cell short fragment insertion deletion site and the RNA to obtain the expression abundance of the somatic cell variation allele.
7. The method for predicting neoantigens involved in frameshift mutations according to claim 6, wherein in step 3A, the analysis of the somatic short segment insertional deletion site in tumor cells by using DNA comparison results is: somatic short segment indel sites in tumor cells were analyzed using DNA comparison results using Mutect2 and/or Strelka2 and/or Varscan2 and/or Vardict and/or Lofreq and/or Scalpel software, respectively.
8. The method for predicting a neoantigen involved in frameshift mutations according to claim 6, wherein in step 3B, said filtering false positive sites according to the analysis result of somatic short fragment insertion deletion sites comprises: and (4) performing data integration and false positive site filtration on the analysis result in the step 3A through SomaticSeq software.
9. The method of predicting a neoantigen involved in frameshift mutations according to claim 6, wherein step 3C is annotated using ANNOVAR software; step 3D, calculating by adopting cnvkit software; in step 3E, calculating the tumor purity and the clone structure by ABSOULTE software; the cloning structure refers to the cloning ratio of a mutant gene, namely the cloning ratio of each somatic mutation site; in the step 3F, analyzing and calculating the gene expression quantity in the tumor cells by adopting RSEM software, wherein the analysis of the gene expression quantity refers to analyzing by adopting RSEM software to obtain the number value of the transcript sequencing fragments of each thousand basic groups in each million sequencing fragments of each gene; in step 3G, analysis was performed using bcftools software.
10. The method of claim 6, wherein step 4 comprises the steps of:
step 4A, filtering the actual somatic cell short fragment insertion deletion site, and filtering short fragment insertion deletion mutation which can generate nonsense-mediated mRNA degradation;
and 4B, extracting sequences of 15 amino acids before and after the mutation site according to the annotated actual somatic cell short segment insertion deletion site to obtain an amino acid sequence of the mutant polypeptide, and combining all the sequences into a fasta file of the polypeptide sequence, namely the short segment insertion deletion mutant protein, as the polypeptide generated after mutation.
11. The method of claim 10, wherein in step 4A, filtration is performed using masonmd software; in step 4B, customProtDB software is used for processing.
12. The method of predicting a neoantigen involving frameshift mutations of claim 10, wherein the step 5 comprises the steps of:
step 5A, analyzing the genotype of MHC class I molecules and the genotype of MHC class II molecules of tumor cells of a patient, and selecting the HLA genotype with the highest consistency according to the analysis result to obtain the genotype of the MHC molecules of the patient;
step 5B, predicting the affinity of the short-fragment insertion deletion mutant protein and the MHC molecules of the patient;
and step 5C, filtering the affinity to select corresponding polypeptides with the affinity lower than a certain threshold value according to the predicted affinity, analyzing the polypeptides and the genotypes of the corresponding MHC molecules of the patient, and calculating to obtain the polypeptide presentation efficiency.
13. The method of predicting neoantigens involved in frameshift mutations according to claim 12, wherein in step 5A, the patient's tumor cells are genotyped for MHC class I molecules using polysolver software and/or HLA-VBSeq software and/or OptiType software and/or HLA-HD software and/or xHLA software; analyzing the genotype of MHC class II molecules of tumor cells of a patient by adopting HLA-VBseq software and/or HLA-HD software and/or xHLA software; step 5B, MHC class I epitomes predictors and/or MHC class II epitomes predictors and/or MHCflurry and APPM software provided by IEDB are adopted for prediction; in step 5C, the analytical calculation was performed using netCTLpan software.
14. The method of claim 1, wherein the step 6 of scoring each of the corresponding neoantigens based on the results of the analysis, the affinity of the potential neoantigens and the potency of the polypeptide presentation comprises:
score=abundance*dissimilarity*clonality
wherein the content of the first and second substances,
abundance=L(IC m )*Freq*tanh(TPM);
dissimilarity=(1–L(IC w /50000))/2;
clonality=NCTL*MC;
L(x)=1/(1+e x ) Tanh () is hyperbolic tangent function calculation, score is score obtained by new antigen calculation, abundance is abundance of mutant polypeptide, dissimilarity is dissimilarity measure of mutant peptide fragmentDifference of affinity with corresponding normal peptide, clonality measures efficiency of successfully presenting new antigen, TPM is gene expression abundance, freq is expression abundance ratio of somatic variation allele, the expression abundance ratio of somatic variation allele refers to ratio of expression abundance and allele expression abundance, and IC m For mutant polypeptide affinities, IC, obtained during affinity prediction w The wild polypeptide binding affinity obtained in the affinity prediction is shown, NCTL is the polypeptide presentation efficiency, and MC is the mutant gene cloning ratio.
CN202210395664.7A 2022-04-15 2022-04-15 Novel antigen prediction methods involving frameshift mutations Pending CN115747327A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210395664.7A CN115747327A (en) 2022-04-15 2022-04-15 Novel antigen prediction methods involving frameshift mutations

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210395664.7A CN115747327A (en) 2022-04-15 2022-04-15 Novel antigen prediction methods involving frameshift mutations

Publications (1)

Publication Number Publication Date
CN115747327A true CN115747327A (en) 2023-03-07

Family

ID=85349043

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210395664.7A Pending CN115747327A (en) 2022-04-15 2022-04-15 Novel antigen prediction methods involving frameshift mutations

Country Status (1)

Country Link
CN (1) CN115747327A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116083587A (en) * 2023-03-15 2023-05-09 中生康元生物科技(北京)有限公司 Method and device for predicting tumor neoantigen based on abnormal variable shear
CN116825188A (en) * 2023-06-25 2023-09-29 北京泛生子基因科技有限公司 Method, device and computer readable storage medium for identifying tumor neoantigen at multiple groups of chemical layers based on high-throughput sequencing technology

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116083587A (en) * 2023-03-15 2023-05-09 中生康元生物科技(北京)有限公司 Method and device for predicting tumor neoantigen based on abnormal variable shear
CN116825188A (en) * 2023-06-25 2023-09-29 北京泛生子基因科技有限公司 Method, device and computer readable storage medium for identifying tumor neoantigen at multiple groups of chemical layers based on high-throughput sequencing technology
CN116825188B (en) * 2023-06-25 2024-04-09 北京泛生子基因科技有限公司 Method, device and computer readable storage medium for identifying tumor neoantigen at multiple groups of chemical layers based on high-throughput sequencing technology

Similar Documents

Publication Publication Date Title
CN109801678B (en) Tumor antigen prediction method based on complete transcriptome and application thereof
JP7217711B2 (en) Identification, production and use of neoantigens
CN115747327A (en) Novel antigen prediction methods involving frameshift mutations
CN110600077B (en) Prediction method of tumor neoantigen and application thereof
Borden et al. Cancer neoantigens: challenges and future directions for prediction, prioritization, and validation
CN110799196B (en) Ranking system for immunogenic cancer specific epitope
CN111415707B (en) Prediction method of clinical individuation tumor neoantigen
US20210379170A1 (en) Selection of cancer mutations for generation of a personalized cancer vaccine
BR112021005702A2 (en) method for selecting neoepitopes
NL1044005B1 (en) Method for analysing human blood group genotype based on high-through sequencing, and application thereof
Olsen et al. Bioinformatics for cancer immunotherapy target discovery
CN112885406B (en) Method and system for detecting HLA heterozygosity loss
Tang et al. TruNeo: an integrated pipeline improves personalized true tumor neoantigen identification
Farrell et al. Integrated computational prediction and experimental validation identifies promiscuous T cell epitopes in the proteome of Mycobacterium bovis
Addala et al. Computational immunogenomic approaches to predict response to cancer immunotherapies
Gfeller et al. Contemplating immunopeptidomes to better predict them
Li et al. The screening, identification, design and clinical application of tumor-specific neoantigens for TCR-T cells
CN114446389A (en) Tumor neoantigen characteristic analysis and immunogenicity prediction tool and application thereof
CN112210596B (en) Tumor neoantigen prediction method based on gene fusion event and application thereof
Jurtz et al. Computational methods for identification of T cell neoepitopes in tumors
CN114882951A (en) Method and device for detecting MHC II tumor neoantigen based on next generation sequencing data
CN116779028A (en) Method, device and computer readable storage medium for predicting neoepitope based on structural variation detection
KR101918818B1 (en) Methods of identifying neoantigens and a device for identifying neoantigens using the same
CN116072258A (en) Device for developing bladder cancer tumor antigen and predicting guiding medication and prognosis of bladder cancer patient
RU2809620C2 (en) Selecting cancer mutations to create personalized cancer vaccine

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination