Neoantigen activity prediction and sequencing method based on tumor neoantigen characteristic value
Technical Field
The invention relates to the field of tumor immunotherapy, in particular to neoantigen activity scoring and sequencing methods based on tumor neoantigen characteristic values.
Background
In recent years, the tumor immunotherapy is very colorful, clinical tests are continuously extremely broken, and the cure rate and the effective remission rate are continuously improved. The efficient and accurate screening of tumor neoantigens is an extremely important and fundamental work in tumor immunotherapy, and is especially important for the tumor immunotherapy such as TCR-T/TIL, personalized vaccines and the like.
At present, the current popular scheme for screening tumor neoantigens is that the step calls tools such as Mutec/Varscan to calculate the gene mutation of tumor cells based on WGS/WES data of tumor-normal tissues, and the step two calls algorithms such as NetMHCpan to predict MHC-I binding neoantigens.
At present, no effective method is available for sequencing based on the activity of antigens so as to improve the antigen screening efficiency. In the scheme, due to the fact that activity sequencing is not carried out on the predicted MHC-I combined new antigen, huge workload is brought to experimental verification, and low screening efficiency of the tumor new antigen is caused.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides novel antigen activity scoring and sequencing methods based on the characteristic value of the tumor novel antigen, which can greatly reduce the workload of experimental verification and realize the efficient and accurate screening of the tumor novel antigen in step .
The technical scheme of the invention is realized as follows:
the method for predicting and sequencing the immunological activity of the neoantigens based on the characteristic values of the tumor neoantigens is characterized by comprising the following steps of:
(1) input of WGS/WES, RNA-SEQ sequencing data for tumor-normal samples: inputting whole gene sequencing data WGS or whole exome sequencing data WES and transcriptome sequencing data RNA-SEQ of a tumor-normal sample;
(2) prediction and annotation of tumor somatic mutation, calculation of relevant characteristic values: based on the sequencing data input in the step (1), a Varscan or Mutect tool is called to analyze and calculate the tumor somatic mutation, a VEP (vector Effect prediction) tool is called to complete mutation annotation, and PyClone, Kallisto, Varscan or Mutect tools are called to calculate the following characteristic values: the cloning ratio of the mutant gene, the expression value TPM of the mutant gene and the mutation frequency VAF of the allele;
(3) prediction of MHC-I binding neo-antigens based on tumor somatic mutations, calculation of associated eigenvalues: based on the tumor somatic mutation and annotation data in the step (2), calling NetMHCpan, Netchop and OptiType tools to predict MHC-I combined new antigen, and calculating the following characteristic values: the percentage of the affinity sorting of the mutated peptide segment and the MHC, the percentage of the affinity sorting of the unmutated peptide segment and the MHC, and the shearing presentation efficiency of the peptide segment;
(4) extraction of all relevant characteristic values of the new antigen: extracting all relevant characteristic values of the tumor neoantigen aiming at the MHC-I binding neoantigen predicted in the step (3);
(5) setting of neoantigen activity scoring function: setting a new antigen activity scoring function aiming at the new antigen characteristic value extracted in the step (4);
(6) neoantigen ranking based on neoantigen activity scoring function: neoantigens are ranked by a neoantigen activity scoring function.
Preferably, in step (4), the neoantigen-associated characteristic value includes Rm、A、RnE, NC, CL, wherein:
Rm-percentage of affinity ranking of the mutated peptide fragments to MHC, calculated from NetMHCpan;
a-allele mutation frequency, VAF, calculated from Varscan/Mutect/Strelka 2;
Rn-percentage of affinity ranking of the unmutated peptide fragments to MHC, calculated from NetMHCpan;
e-mutant gene expression value TPM, calculated by Kallisto;
NC-peptide fragment shearing presentation efficiency calculated by netchop;
CL-mutant gene cloning ratio, calculated from pyclone.
Preferably, in step (5), the proposed prediction scoring function for neoantigen activity is:
Neo_Score=abundance·dissimilarity·clonality;
clonality=NC·CL;
abundance=L(Rm)·A·tanh(E/k);
dissimilarity=(1-L(Rn)/2));
wherein: l (x) 1/(1+ e)5(x-2)) Tan h (x) is a hyperbolic tangent function;
k is the transcript expression abundance threshold with a default value of 1.
Preferably, in step (6), the ranking algorithm for the neoantigens by the neoantigen activity prediction function is as follows:
a) for all predicted MHC-I combined neoantigens, calling a neoantigen activity prediction function Neo _ score to calculate the predicted value of the neoantigen activity;
b) based on the predicted value of the activity of the new antigens, sequencing the new antigens by adopting a rapid sequencing algorithm;
c) and outputting a new antigen sequencing result.
The design idea and the beneficial effect of the invention adopting the technical scheme are as follows:
according to the technical scheme, tumor neoantigen activity prediction scoring functions are provided, and activity sequencing is carried out on the predicted MHC-I combined neoantigens based on the neoantigen activity prediction functions, so that efficient and accurate tumor neoantigen screening is realized.
The function is designed based on the complete process of generating, cutting and transporting tumor neoantigens and combining the neoantigens with MHC, and the scoring function is divided into 3 parts, wherein the clonality (clonality) measures the efficiency NC of the neoantigens from mutation to short peptides and the distribution ratio CL of the neoantigens in all tumor cells, and is an important factor influencing the curative effect of the tumor vaccine; abundance (abundance) measures the expression level of neoantigen and the efficiency of binding of neoantigen to MHC-I and forming pMHC complex, the higher the expression level E of neoantigen mutant gene, the higher the allele mutation frequency A, and the binding affinity R between complexesmThe higher (the smaller the IC50 value), the more immunogenic; dissimilarity (dissimilarity) measures the difference R in affinity of a mutant peptide fragment to a corresponding normal peptide fragmentnTwo mapping functions in the function are used for classifying (0 to 1) calculated values, a threshold value 2 in L (x) is a peptide fragment-MHC binding affinity screening threshold value, and tanh (x/k) ensures that when the expression abundance of the neoantigen exceeds a set threshold value k, the function value changes smoothly.
The activity prediction function considers comprehensive influence factors in the process of generating the new antigen, and the sequenced antigens have more significance and application value.
Drawings
FIG. 1 is a schematic diagram of a method for determining neoantigen activity and a method for ranking based on tumor neoantigen characteristics according to an embodiment of the present invention.
Detailed Description
The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.
Example :
as shown in FIG. 1, methods for activity and ranking of neoantigens based on tumor neoantigen characteristics comprise the following steps:
step 101. input of WGS/WES, RNA-seq sequencing data for tumor-normal samples (using melanoma patient samples mel _21, Science 2015: Carreno B M, Magrini V, Beckerhapak M, et al. cancer cell culture in said research and diversity of melanomas neoantigen-specific T cells. [ J ]. Science 2015,348 6236 ]: 803-8.)
102, predicting and annotating tumor somatic mutation and calculating related characteristic values, namely, calling tools such as Varscan/Mutect and the like to analyze and calculate the tumor somatic mutation based on WGS/WES and RNA-seq sequencing data of a tumor-normal sample, calling VEP (vector Effect prediction) tools to finish mutation annotation, calling tools such as PyClone, Kallisto and Varscan/Mutect to calculate the following characteristic values, namely, a mutant gene clone ratio CL, a mutant gene expression value TPM and an allele mutation frequency VAF, taking the characteristic values corresponding to active peptide segments in the literature as examples, and calculating E (TPM) and A (VAF) values corresponding to 3 peptide segments of mel-21 of a patient sample as follows:
TABLE 1
Step 103: prediction of MHC-I binding neo-antigens based on tumor somatic mutations, calculation of associated eigenvalues: based on the tumor somatic mutation and annotation data in the step (1), calling NetMHCpan, Netchop and OptiType tools to predict MHC-I combined new antigen, and calculating the following characteristic values: percentage of affinity ranking of mutant peptide fragments to MHC RmPercent of affinity ranking between unmutated peptide and MHC, RnThe cleavage and presentation efficiency NC. of peptide fragments is represented by the characteristic value corresponding to active peptide fragments in the literature, and R is calculated corresponding to 3 peptide fragments of mel _21 of patient samplem,RnAnd NC values are as follows:
TABLE 2
104, extracting relevant characteristic values of the new antigens, namely extracting all relevant characteristic values of the tumor new antigens aiming at the MHC-I combined new antigens predicted in the step 103, wherein the new antigens and the characteristic values of a patient sample mel _21 are shown in tables 1 and 2;
step 105: setting of neoantigen activity scoring function: setting a neoantigen activity scoring function aiming at the neoantigen characteristic value extracted in the step 104;
neoantigen ranking based on neoantigen Activity scoring function neoantigens were ranked by the neoantigen Activity scoring function, and the scores of neoantigens (Neo _ Score) and rankings in the corpus (Rank) confirmed by PMHC activity assay in patient sample mel _21 are given in Table 3.
TABLE 3
Of the 3 identified neoantigens, CLNEYHLFL showed immune activity that activated CD8+ T cells before stimulation of DC cells with tumor vaccine, and the remaining 2 had varying degrees of immune activity after the vaccine was used to boost the immune system. We see that KMIGNHLWV and CLNEYHLFL are in the ranking Top 3 (total candidate neo-antigen number 94) position. AMFWSVPTS in vitro experiment or human tumor microenvironment, the expression level is low, so the immunogenicity and immune response are weak, and 33 bits are in our ranking results, and the experimental results are quite consistent with our predicted ranking.
Example two:
as shown in FIG. 1, methods for activity and ranking of neoantigens based on tumor neoantigen characteristics comprise the following steps:
step 101: (ii) input of WGS/WES, RNA-seq sequencing data for tumor-Normal samples (using melanoma patient samples two mel 38, Science 2015: Carreno B M, Magrini V, Beckerheapak M, et al. cancer cell therapy. A polymeric cell vaccine assays the break and diversity of melanoma neoantigen-specific T cells. [ J ]. Science 2015,348(6236):803-8.)
Step 102: prediction and annotation of tumor somatic mutation and calculation of related characteristic values: based on WGS/WES and RNA-seq sequencing data of a tumor-normal sample, tumor somatic mutation is calculated by calling a tool analysis such as Varscan/Mutect, a VEP (variable Effect prediction) tool is called to complete mutation annotation, and PyClone, Kallisto and Varscan/Mutect tools are called to calculate the following characteristic values: the mutant gene clone ratio CL, the mutant gene expression value TPM and the allele mutation frequency VAF; taking the characteristic values corresponding to active peptides in the literature as an example, the E (TPM) and A (VAF) values calculated for 3 peptides of the patient sample two mel _38 are as follows:
TABLE 4
Step 103: prediction of MHC-I binding neo-antigens based on tumor somatic mutations, calculation of associated eigenvalues: based on the tumor somatic mutation and annotation data in the step (1), calling NetMHCpan, Netchop and OptiType tools to predict MHC-I combined new antigen, and calculating the following characteristic values: percentage of affinity ranking of mutant peptide fragments to MHC RmPercent of affinity ranking between unmutated peptide and MHC, RnPeptide fragment shear presentation efficiency NC; taking the characteristic value corresponding to the active peptide fragment in the literature as an example, R is calculated corresponding to 3 peptide fragments of two mel-38 of a patient samplem,RnAnd NC values are as follows:
TABLE 5
Step 104: extraction of new antigen-associated feature values: extracting all relevant characteristic values of the tumor neoantigen aiming at the MHC-I combined neoantigen predicted in the step (2), wherein the neoantigen and the characteristic values of the patient sample two mel _38 are shown in tables 4 and 5;
step 105: setting of neoantigen activity scoring function: setting a new antigen activity scoring function aiming at the new antigen characteristic value extracted in the step (3);
step 106: neoantigen ranking based on neoantigen activity scoring function: the neoantigens were ranked by a neoantigen activity scoring function, and the Score of the neoantigens in the PMHC activity assay (Neo _ Score) and the ranking in the corpus (Rank) in patient samples two mel 38 are given in Table 6
TABLE 6
Of the 3 identified neoantigens, FLYNLLTRVY showed immune activity that activated CD8+ T cells before stimulation of DC cells with tumor vaccine, and the remaining 2 had varying degrees of immune activity after the vaccine was used to boost the immune system. We see QLSCISTYV and FLYNLLTRVY at Top 20 (Total number of candidate neo-antigens 117). KLMNIQQKL the immunogenicity and immune response were determined to be weak due to low expression level in vitro experiments and human tumor microenvironment, and the results were at 66 in our ranking results, which are quite consistent with our predicted ranking.
In conclusion, the immune activity scoring function of the new antigen provided by the inventor can effectively measure the immune activity of the new antigen, and provides help for clinical experiments, tumor research and immunotherapy.