Disclosure of Invention
The application aims to provide a novel tumor neoantigen detection method, a device and a storage medium based on second-generation sequencing.
In order to achieve the purpose, the following technical scheme is adopted in the application:
in a first aspect of the application, a method for detecting tumor neoantigen based on second generation sequencing is disclosed, which comprises the following steps,
detecting mutation, namely detecting tumor somatic point mutation and insertion deletion mutation of a comparison file of sequencing results of a tumor sample and a normal sample by adopting at least two mutation detection software, and taking an intersection detected by the two mutation detection software as a candidate mutation; meanwhile, carrying out fusion gene mutation detection on a comparison file of a tumor transcriptome sequencing result, and taking the detected fusion gene mutation as a candidate mutation; the intersection detected by the two mutation detection software means that the two mutation detection software have detected mutation at the same time, in an implementation mode of the application, the VarScan software and the mutect software are specifically adopted to detect the point mutation and the insertion deletion mutation, and STAR-Fusion is adopted to detect the Fusion gene mutation;
the MHC molecule identification step comprises the steps of respectively adopting HLA molecule type detection software, namely, polysolver and BWA mem to detect the HLA molecule types of a normal sample and a tumor sample, and outputting an HLA molecule subtype result if the HLA molecule of the tumor sample detected by the polysolver is matched with the normal sample; if the HLA molecule of the tumor sample detected by the BWA mem is not matched with the normal sample, the matching condition of the HLA molecule of the tumor sample detected by the BWA mem and the normal sample is checked, if the HLA molecule subtype of the tumor sample detected by the BWA mem is matched with the normal sample, the HLA molecule subtype detection result of the BWA mem is output, and if the HLA molecule subtype of the tumor sample detected by the BWA mem is not matched with the normal sample, an empty result is output, which indicates that the HLA molecule subtype cannot be judged;
a mutation annotation step comprising performing annotation of genomic mutations to amino acid mutations for point mutations and indel mutations among the candidate mutations; in an implementation manner of the present application, a vep (variable Effect prediction) is specifically adopted for annotation;
a step of predicting mutant peptide fragments, which comprises predicting peptide fragments of point mutation, insertion deletion mutation and fusion gene mutation in candidate mutation; specifically, the method comprises the steps of taking point mutation mutant amino acid as a center, and extending the length of at least 10 amino acids back and forth to be used as a point mutation prediction peptide segment; the mutation prediction peptide segment of the insertion deletion mutation is used as a mutation prediction peptide segment by taking the mutation position of the insertion deletion mutation as a center, extending forwards for at least 10 amino acids and extending backwards until reaching the position of normal amino acid translation; intercepting at least 10 amino acids at the 3 'end and the 5' end of the fusion gene as a mutation prediction peptide segment of the fusion gene mutation by taking the fusion site of the fusion gene mutation as a center; in one implementation of the present application, the transvar tool is specifically used for prediction of the genomic mutant peptide fragment;
the prediction step of the affinity of the mutant peptide segment MHC I type and MHC II type comprises the steps of using the HLA (human lymphocyte antigen, abbreviated as HLA) molecule type of a tumor sample obtained in the MHC molecule identification step, the mutant prediction peptide segment obtained in the mutant peptide segment prediction step and a wild type peptide segment sequence corresponding to the mutant prediction peptide segment as the input of MHC I type and MHC II type affinity prediction software, respectively predicting the affinity level of the mutant peptide segment and MHC I type and MHC II type genes, and using the predicted affinity level less than 500nM as a candidate tumor neoantigen; in one implementation of the application, the affinity prediction software specifically adopts netMHCpan and netMHCIIpan, and 500nM is a conventional judgment value;
detecting antigen expression abundance, including detecting antigen expression abundance of each mutation prediction peptide segment in candidate tumor neogenesis antigen by adopting antigen expression abundance calculation software; in one implementation manner of the application, RSEM software is specifically adopted to calculate the TPM value of the mutant peptide segment as the expression abundance of the new antigen;
the clonality analysis step comprises detecting the clonality of each mutation prediction peptide segment in the candidate tumor neogenetic antigen by adopting mutation clonality analysis software, wherein the clonality is characterized by the proportion of the mutation cells in the tumor tissue to be detected; in one implementation of the present application, the clonality of the mutation where the antigen is located is calculated specifically by using PyClone, and the probability of cloning of the nascent antigen and the probability of subcloning, that is, the probability of cloning of the mutation and the probability of subcloning, are output;
the comprehensive grading and sequencing step of the candidate tumor neogenesis antigens comprises the steps of grading each mutation prediction peptide segment in the candidate tumor neogenesis antigens according to a formula, sequencing the mutation prediction peptide segments from high to low according to scores, and selecting the candidate tumor neogenesis antigens with high scores as the tumor neogenesis antigens;
the formula I is as follows: score (m) ═ epistoontent (m) × expresslevel (m) × clonallevel (m)
In formula one, score (m) is the total score of the mutation prediction peptide segment m, and episteContent (m) represents the sum of the scores of all antigen peptide segments p with MHC affinity corresponding to the neoantigen m; expression level (m) represents the antigen expression abundance of the neoantigen m; ClonalLevel (m) indicates the clonality of the neoantigen m.
It can be understood that the application carries out comprehensive scoring and ordering on all candidate tumor neoantigens, the higher the score of the neoantigen is, the higher the quality of the neoantigen is, the better the effect of taking the high score of the neoantigen as a target point for cell or vaccine treatment is, therefore, the high score of the neoantigen is preferably selected according to the score from high to low when the application is selected.
The method for detecting the tumor neoantigen directly starts from the comparison result of second-generation sequencing, detects mutation and MHC (major histocompatibility complex) types, and scores candidate tumor neoantigens from multiple angles such as antigen expression abundance, clonality and MHC affinity, so that the high-quality tumor neoantigen is screened out. Therefore, the tumor neoantigen detection method of the present application has the following advantages: 1) screening of various variant peptide fragments can be carried out, including: missense mutation, splice site mutation, frameshift mutation, non-frameshift indel, fusion gene; 2) the clonality of the novel antigen can be detected; 3) the affinity of the peptide fragment with MHCI and MHCII can be predicted simultaneously, and the affinity prediction result is optimized by utilizing various algorithms; 4) false positive filtration is carried out aiming at the predicted peptide fragment, and the false positive filtration comprises a plurality of parameters such as wildtype and homology filtration; 5) and (4) according to the affinity, expression, clonality and the like, scoring and sequencing the neoantigens, and screening out the high-quality neoantigens.
Preferably, in the method for detecting a tumor neoantigen of the present application, EpitoTentent (m) of formula I is obtained by calculation of formula II,
the formula II is as follows:
in the second formula, EpitopeScore (p [ I: I + k ] represents the sum of the affinities of each mutation prediction peptide segment, the antigen peptide segment p which extends k amino acids back and forth with the mutation amino acid as the center and each MHC, I represents the serial numbers of all the antigen peptides which span the mutation under the specific antigen peptide which extends k lengths back and forth, the serial numbers begin from 0, | p | represents the length of the peptide segment which extends k amino acids back and forth with the mutation amino acid as the center, and | p | -k represents the upper limit of the serial numbers of all the antigen peptides which span the mutation under the specific antigen peptide which extends k lengths back and forth, namely the sum of the number of all the antigen peptides which span the mutation, wherein the length of k in the antigen peptide of type I MHC is 8, 9, 10 or 11, and the length of k in the antigen peptide of type II MHC is 15;
preferably, Epitoposcore (p [ i: i + k ] is calculated from equation three,
the formula III is as follows: epitoscore (e) ═ Σa∈HLAσ(BindingAffinity(e,a))×SelfFilter(e,a)
In formula III, Epitoescore (e) is Epitoescore (p [ i: i + k)]Value, sigmaa∈HLAσ (BindingAffinity (e, a)) represents the sum of the affinities of each core peptide fragment e and all MHC subtypes a, σ (BindingAffinity (e, a)) is calculated by formula four, and SelfFilter (e, a) refers to the homology of the antigen peptide fragments;
the formula four is as follows:
in the formula IV, sigma(s), namely sigma (binding affinity (e, a)), e is a natural base number, and s is the affinity value of the core peptide fragment e and the MHC of the subtype a given by the affinity prediction software;
SelfFilter (e, a) was evaluated as follows, antigen peptide e, in case of homologous peptide fragments of MHC subtype a, if similar peptide fragments were found on the normal human genome, SelfFilter (e, a) value was 0, otherwise 1.
Preferably, in the method for detecting a tumor neoantigen of the present application, the expression level (m) in the formula one is obtained by the following method, if the antigen expression level of the peptide segment m predicted by mutation is lower than 10-3If so, then expression level (m) is 0; if the mutation predicts the antigen expression level of peptide fragment m not less than 10-3Then, expression level (m) is taken as the antigen expression abundance value output by the antigen expression abundance calculation software. Wherein the antigen expression level is less than 10-3If the expression level is not expressed, the expression level is defined as 0, and the antigen expression level is detected by antigen expression abundance calculation softwareThe abundance of antigen expression of (a);
preferably, in the method for detecting a tumor neoantigen of the present application, ClonalLevel (m) in formula I is obtained by calculating formula V,
the formula five is as follows: ClonalLevel (m) ═ p (Clonal) x (1-p (sublonal))
In the formula five, p (Clonal) is the probability of the new antigen clone output by the mutation clonality analysis software, and p (subclonal) is the probability of the new antigen subclone output by the mutation clonality analysis software.
Preferably, in the step of detecting the antigen expression abundance, the software for calculating the antigen expression abundance is RSEM software, and the TPM value of the mutation prediction peptide segment calculated by the RSEM software is used as the antigen expression abundance.
In the present application, the neoantigen m represents a neoantigen from which a mutation is derived, and a mutation can generate a plurality of antigenic peptides p, so that the formula of the present application is to sum up the scores of all the antigenic peptides p having antigenic ability as the total score of the mutation into the neoantigen. Each mutation is classified into different MHC subtypes, and the number of the mutations can be multiple, and at most 8 of the mutations can be predicted in human individuals; according to different lengths k of peptide fragments bound with MHC, 5 kinds of length of antigen peptide fragments can be used; there are therefore multiple summation symbols in equation two. The mutant peptide refers to a peptide segment which can be generated by mutation predicted at the beginning, namely a mutant prediction peptide segment; the antigen peptide fragment p refers to all potential peptide fragments which are selected from mutant peptides and have fixed length and can be recognized by MHC; the core peptide fragment e is a peptide fragment with immunogenicity predicted from all potential antigen peptide fragments p after being predicted by affinity prediction software, namely the antigen peptide fragment p with the affinity less than 500 nM.
In a second aspect, the present application discloses a tumor neoantigen detection device based on second generation sequencing, comprising,
the mutation detection module is used for detecting tumor somatic point mutation and insertion deletion mutation of a comparison file of sequencing results of a tumor sample and a normal sample by adopting at least two mutation detection software, and taking an intersection detected by the two mutation detection software as a candidate mutation; meanwhile, carrying out fusion gene mutation detection on a comparison file of a tumor transcriptome sequencing result, and taking the detected fusion gene mutation as a candidate mutation;
the MHC molecule identification module is used for respectively adopting HLA molecule type detection software polysolver and BWA mem to detect the HLA molecule types of the normal sample and the tumor sample, and outputting the result if the HLA molecule of the tumor sample detected by the polysolver is matched with the normal sample; if not, checking the matching condition of the HLA molecules of the tumor sample detected by the BWA mem and the normal sample, if so, outputting the detection result of the BWA mem, and if not, outputting an empty result;
a mutation annotation module for annotating genomic mutations to amino acid mutations for point mutations and indel mutations of the candidate mutations;
the mutant peptide fragment prediction module is used for predicting the peptide fragments of point mutation, insertion deletion mutation and fusion gene mutation in the candidate mutation; specifically, the method comprises the steps of taking point mutation mutant amino acid as a center, and extending the length of at least 10 amino acids back and forth to be used as a point mutation prediction peptide segment; the mutation prediction peptide segment of the insertion deletion mutation is used as a mutation prediction peptide segment by taking the mutation position of the insertion deletion mutation as a center, extending forwards for at least 10 amino acids and extending backwards until reaching the position of normal amino acid translation; intercepting at least 10 amino acids at the 3 'end and the 5' end of the fusion gene as a mutation prediction peptide segment of the fusion gene mutation by taking the fusion site of the fusion gene mutation as a center;
the mutant peptide fragment MHC I and MHC II affinity prediction module is used for taking the HLA molecule type of the tumor sample obtained in the MHC molecule identification step, the mutant prediction peptide fragment obtained in the mutant peptide fragment prediction step and the wild type peptide fragment sequence corresponding to the mutant prediction peptide fragment as the input of MHC I and MHC II affinity prediction software, respectively predicting the affinity level of the mutant peptide fragment and MHC I and MHC II genes, and taking the mutant peptide fragment with the predicted affinity level less than 500nM as a candidate tumor neogenetic antigen;
the antigen expression abundance detection module is used for detecting the antigen expression abundance of each mutation prediction peptide segment in the candidate tumor neogenetic antigen by adopting antigen expression abundance calculation software;
the clonality analysis module is used for detecting the clonality of each mutation prediction peptide segment in the candidate tumor neogenetic antigen by adopting mutation clonality analysis software, and the clonality is characterized by the proportion of the mutant cells in the detected tumor tissue in the tumor cells;
the comprehensive scoring and sorting module of the candidate tumor neogenesis antigens is used for scoring each mutation prediction peptide segment in the candidate tumor neogenesis antigens according to a formula, sorting the candidate tumor neogenesis antigens from high to low according to scores, and selecting the candidate tumor neogenesis antigens with high scores as the tumor neogenesis antigens;
the formula I is as follows: score (m) ═ epistoontent (m) × expresslevel (m) × clonallevel (m)
In formula one, score (m) is the total score of the mutation prediction peptide segment m, and episteContent (m) represents the sum of the scores of all antigen peptide segments p with MHC affinity corresponding to the neoantigen m; expression level (m) represents the antigen expression abundance of the neoantigen m; ClonalLevel (m) indicates the clonality of the neoantigen m.
Preferably, in the tumor neoantigen detection device of the present application, epitopecontent (m), expresslevel (m), and clonallevel (m) of formula one are calculated according to the tumor neoantigen detection method of the present application.
The third aspect of the present application discloses a tumor neoantigen detection device based on second-generation sequencing, comprising:
a memory for storing a program;
and a processor for executing the program stored in the memory to realize the tumor neoantigen detection method.
A fourth aspect of the present application discloses a computer-readable storage medium containing a program executable by a processor to implement the tumor neoantigen detection method of the present application.
Due to the adoption of the technical scheme, the beneficial effects of the application are as follows:
according to the tumor neoantigen detection method, mutation and MHC detection are directly carried out on the basis of a comparison file of second-generation sequencing, and candidate tumor neoantigens are scored according to three dimensions of MHC I/II type affinity, antigen expression abundance and clonality, so that false positive of neoantigen screening can be reduced, neoantigens with higher immunogenicity can be screened out through scoring and sequencing, high-quality tumor neoantigens are screened out, and a foundation is laid for immunotherapy based on the tumor neoantigens.
Detailed Description
The present application will be described in further detail below with reference to the accompanying drawings by way of specific embodiments. In the following description, numerous details are set forth in order to provide a better understanding of the present application. However, those skilled in the art will readily recognize that some of the features may be omitted or replaced with other elements, materials, methods in different instances. In some instances, certain operations related to the present application have not been shown or described in detail in this specification in order to avoid obscuring the core of the present application from excessive description, and it is not necessary for those skilled in the art to describe these operations in detail, so that they can be fully understood from the description in this specification and the general knowledge of the art.
As shown in FIG. 1, the method for detecting tumor neoantigen based on second generation sequencing of the present application comprises the following steps,
(1) detecting mutation, namely detecting tumor somatic point mutation and insertion deletion mutation of a comparison file of sequencing results of a tumor sample and a normal sample by adopting at least two mutation detection software, and taking an intersection detected by the two mutation detection software as a candidate mutation; meanwhile, the detection of the mutation of the fusion gene is carried out on the comparison file of the sequencing result of the tumor transcriptome, and the detected mutation of the fusion gene is also used as a candidate mutation.
Wherein, the intersection detected by the two mutation detection software means that the two mutation detection software have the detected mutation at the same time, and in some embodiments, the VarScan and the mutect software are specifically adopted to detect the point mutation and the insertion deletion mutation; and detecting the Fusion gene mutation by using STAR-Fusion, namely performing Fusion gene detection on the compared RNA bam format file by using STAR-Fusion.
(2) MHC molecule identification step, including adopting HLA molecule type detection software polysolver and BWA mem to detect HLA molecule types of normal sample and tumor sample, if HLA molecule of tumor sample detected by polysolver matches with normal sample, outputting as result; if not, checking the match condition of the HLA molecules of the tumor sample detected by the BWA mem and the normal sample, if the match condition is matched, outputting the detection result of the BWA mem, and if the mismatch condition is still not matched, outputting an empty result.
(3) A variation annotation step comprising annotating genomic mutations to amino acid mutations for point mutations and indel mutations in the candidate mutations.
In some embodiments, the annotation is specifically made using VEP (variable Effect prediction).
(4) A step of predicting mutant peptide fragments, which comprises predicting peptide fragments of point mutation, insertion deletion mutation and fusion gene mutation in candidate mutation; specifically, the method comprises the steps of taking point mutation mutant amino acid as a center, and extending the length of at least 10 amino acids back and forth to be used as a point mutation prediction peptide segment; the mutation prediction peptide segment of the insertion deletion mutation is used as a mutation prediction peptide segment by taking the mutation position of the insertion deletion mutation as a center, extending forwards for at least 10 amino acids and extending backwards until reaching the position of normal amino acid translation; taking the fusion site of the fusion gene mutation as a center, intercepting at least 10 amino acids at the 3 'end and the 5' end of the fusion gene as a mutation prediction peptide segment of the fusion gene mutation.
In some embodiments, prediction of the genomic mutant peptide fragment is performed specifically using the transvar tool.
(5) The method comprises a step of predicting the affinity of the mutant peptide fragment MHC I and MHC II, which comprises the steps of using the HLA molecule type of a tumor sample obtained in the step of identifying the MHC molecule, the mutant prediction peptide fragment obtained in the step of predicting the mutant peptide fragment and the wild peptide fragment sequence corresponding to the mutant prediction peptide fragment as the input of MHC I and MHC II affinity prediction software, respectively predicting the affinity level of the mutant peptide fragment and MHC I and MHC II genes, and using the mutant peptide fragment with the predicted affinity level less than 500nM as a candidate tumor neogenesis antigen.
In some embodiments, netMHCpan and netMHCIIpan are used to predict levels of affinity to MHC class I and MHC class II genes, respectively.
(6) And detecting the antigen expression abundance, wherein antigen expression abundance of each mutation prediction peptide segment in the candidate tumor neogenetic antigen is detected by adopting antigen expression abundance calculation software.
In some embodiments, the TPM value of the mutated peptide fragment is calculated as the neoantigen expression abundance, particularly using RSEM software.
(7) And a clonality analysis step, which comprises detecting the clonality of each mutation prediction peptide segment in the candidate tumor neogenetic antigen by adopting mutation clonality analysis software, wherein the clonality is characterized by the proportion of the mutant cells in the detected tumor tissue in the tumor cells.
In some embodiments, Pyclone is used specifically to calculate the clonality of the mutation in which the antigen is located and to output the probability of cloning and the probability of subcloning of the nascent antigen, i.e., the probability of cloning and the probability of subcloning of each mutation.
(8) The comprehensive grading and sequencing step of the candidate tumor neogenesis antigens comprises the steps of grading each mutation prediction peptide segment in the candidate tumor neogenesis antigens according to a formula, sequencing the mutation prediction peptide segments from high to low according to scores, and selecting the candidate tumor neogenesis antigens with high scores as the tumor neogenesis antigens;
the formula I is as follows: score (m) ═ epistoontent (m) × expresslevel (m) × clonallevel (m)
In formula one, score (m) is the total score of the mutation prediction peptide segment m, and episteContent (m) represents the sum of the scores of all antigen peptide segments p with MHC affinity corresponding to the neoantigen m; expression level (m) represents the antigen expression abundance of the neoantigen m; ClonalLevel (m) indicates the clonality of the neoantigen m.
Wherein EpitopeContent (m) of formula one is calculated by formula two,
the formula II is as follows:
in the formula II, EpitopeScore (p [ i: i + k ] represents the sum of the affinities of each mutation prediction peptide segment, antigen peptide segments p which are extended by k amino acids in front and back directions by taking the mutation amino acid as the center and each MHC, i represents the serial numbers of all antigen peptides which cross the mutation under the specific antigen peptide which is extended by k lengths in front and back directions, wherein the serial numbers start from 0, | p | represents the length of the peptide segment which is extended by k amino acids in front and back directions by taking the mutation amino acid as the center, and | p | -k represents the upper limit of the serial numbers of all antigen peptides which cross the mutation under the specific antigen peptide which is extended by k lengths in front and back directions, namely the sum of all the antigen peptides which cross the mutation;
EpitoeCore (p [ i: i + k ] is calculated by equation three,
the formula III is as follows: epitoscore (e) ═ Σa∈HLAσ(BindingAffinity(e,a))×SelfFilter(e,a)
In formula III, Epitoescore (e) is Epitoescore (p [ i: i + k)]Value, sigmaa∈HLAσ (BindingAffinity (e, a)) represents the sum of the affinities of each core peptide fragment e and all MHC subtypes a, σ (BindingAffinity (e, a)) is calculated by formula four, and SelfFilter (e, a) refers to the homology of the antigen peptide fragments;
the formula four is as follows:
in the formula IV, sigma(s), namely sigma (binding affinity (e, a)), e is a natural base number, and s is the affinity value of the core peptide fragment e and the MHC of the subtype a given by the affinity prediction software;
SelfFilter (e, a) was evaluated as follows, antigen peptide e, in case of homologous peptide fragments of MHC subtype a, if similar peptide fragments were found on the normal human genome, SelfFilter (e, a) value was 0, otherwise 1.
Expression level (m) of formula one is given byValue is obtained by the method, if the mutation predicts the antigen expression level of the peptide segment m to be lower than 10-3If so, then expression level (m) is 0; if the mutation predicts the antigen expression level of peptide fragment m not less than 10-3Then, expression level (m) is taken as the antigen expression abundance value output by the antigen expression abundance calculation software.
ClonalLevel (m) of formula one is obtained by calculating formula five,
the formula five is as follows: ClonalLevel (m) ═ p (Clonal) x (1-p (sublonal))
In the formula five, p (Clonal) is the probability of the new antigen clone output by the mutation clonality analysis software, and p (subclonal) is the probability of the new antigen subclone output by the mutation clonality analysis software.
Those skilled in the art will appreciate that all or part of the functions of the above-described method embodiments may be implemented by hardware, or may be implemented by computer programs. When all or part of the functions of the above embodiments are implemented by a computer program, the program may be stored in a computer-readable storage medium, and the storage medium may include: a read only memory, a random access memory, a magnetic disk, an optical disk, a hard disk, etc., and the program is executed by a computer to realize the above functions. For example, the program may be stored in a memory of the device, and when the program in the memory is executed by the processor, all or part of the functions described above may be implemented. In addition, when all or part of the functions in the above embodiments are implemented by a computer program, the program may be stored in a storage medium such as a server, another computer, a magnetic disk, an optical disk, a flash disk, or a removable hard disk, and may be downloaded or copied to a memory of a local device, or may be version-updated in a system of the local device, and when the program in the memory is executed by a processor, all or part of the functions in the above embodiments may be implemented.
Therefore, as shown in fig. 2, in an embodiment of the present application, a tumor neoantigen detection device based on second generation sequencing includes: a variation detection module 201, an MHC molecule identification module 202, a variation annotation module 203, a mutant peptide segment prediction module 204, a mutant peptide segment MHC I and MHC II affinity prediction module 205, an antigen expression abundance detection module 206, a clonality analysis module 207 and a candidate tumor neogenesis antigen comprehensive scoring and sorting module 208.
The mutation detection module 201 is configured to detect tumor somatic point mutation and insertion deletion mutation in a comparison file of sequencing results of a tumor sample and a normal sample by using at least two mutation detection software, and use an intersection detected by the two mutation detection software as a candidate mutation; meanwhile, carrying out fusion gene mutation detection on a comparison file of a tumor transcriptome sequencing result, and taking the detected fusion gene mutation as a candidate mutation; the MHC molecule identification module 202 is used for respectively adopting HLA molecule type detection software polysolver and BWA mem to detect the HLA molecule types of the normal sample and the tumor sample, and outputting the result if the HLA molecule of the tumor sample detected by the polysolver is matched with the normal sample; if not, checking the matching condition of the HLA molecules of the tumor sample detected by the BWA mem and the normal sample, if so, outputting the detection result of the BWA mem, and if not, outputting an empty result; a mutation annotation module 203 for annotating point mutations and indel mutations in the candidate mutations from genomic mutations to amino acid mutations; a mutant peptide fragment prediction module 204, configured to predict peptide fragments of point mutation, insertion deletion mutation, and fusion gene mutation in candidate mutation; specifically, the method comprises the steps of taking point mutation mutant amino acid as a center, and extending the length of at least 10 amino acids back and forth to be used as a point mutation prediction peptide segment; the mutation prediction peptide segment of the insertion deletion mutation is used as a mutation prediction peptide segment by taking the mutation position of the insertion deletion mutation as a center, extending forwards for at least 10 amino acids and extending backwards until reaching the position of normal amino acid translation; intercepting at least 10 amino acids at the 3 'end and the 5' end of the fusion gene as a mutation prediction peptide segment of the fusion gene mutation by taking the fusion site of the fusion gene mutation as a center; a mutant peptide segment MHC I and MHC II affinity prediction module 205 for inputting the HLA molecule type of the tumor sample obtained in the MHC molecule identification step, the mutant prediction peptide segment obtained in the mutant peptide segment prediction step, and the wild type peptide segment sequence corresponding to the mutant prediction peptide segment as MHC I and MHC II affinity prediction software, respectively predicting the affinity levels of the mutant peptide segment and MHC I and MHC II genes, and using the predicted affinity level less than 500nM as a candidate tumor neoantigen; an antigen expression abundance detection module 206, configured to detect the antigen expression abundance of each mutation prediction peptide segment in the candidate tumor neogenetic antigen by using antigen expression abundance calculation software; the clonality analysis module 207 is used for detecting the clonality of each mutation prediction peptide segment in the candidate tumor neogenetic antigen by adopting mutation clonality analysis software, and the clonality is characterized by the proportion of the mutant cells in the detected tumor tissues in the tumor cells; and the candidate tumor neogenesis antigen comprehensive scoring and sorting module 208 is used for scoring each mutation prediction peptide segment in the candidate tumor neogenesis antigens according to a formula, sorting the candidate tumor neogenesis antigens from high to low according to scores, and selecting the candidate tumor neogenesis antigens with high scores as the tumor neogenesis antigens.
Another embodiment of the present application further provides a device for detecting tumor neoantigen based on second-generation sequencing, comprising: a memory for storing a program; a processor for implementing the following method by executing the program stored in the memory: detecting mutation, namely detecting tumor somatic point mutation and insertion deletion mutation of a comparison file of sequencing results of a tumor sample and a normal sample by adopting at least two mutation detection software, and taking an intersection detected by the two mutation detection software as a candidate mutation; meanwhile, carrying out fusion gene mutation detection on a comparison file of a tumor transcriptome sequencing result, and taking the detected fusion gene mutation as a candidate mutation; MHC molecule identification step, including adopting HLA molecule type detection software polysolver and BWA mem to detect HLA molecule types of normal sample and tumor sample, if HLA molecule of tumor sample detected by polysolver matches with normal sample, outputting as result; if not, checking the matching condition of the HLA molecules of the tumor sample detected by the BWA mem and the normal sample, if so, outputting the detection result of the BWA mem, and if not, outputting an empty result; a mutation annotation step comprising performing annotation of genomic mutations to amino acid mutations for point mutations and indel mutations among the candidate mutations; a step of predicting mutant peptide fragments, which comprises predicting peptide fragments of point mutation, insertion deletion mutation and fusion gene mutation in candidate mutation; specifically, the method comprises the steps of taking point mutation mutant amino acid as a center, and extending the length of at least 10 amino acids back and forth to be used as a point mutation prediction peptide segment; the mutation prediction peptide segment of the insertion deletion mutation is used as a mutation prediction peptide segment by taking the mutation position of the insertion deletion mutation as a center, extending forwards for at least 10 amino acids and extending backwards until reaching the position of normal amino acid translation; intercepting at least 10 amino acids at the 3 'end and the 5' end of the fusion gene as a mutation prediction peptide segment of the fusion gene mutation by taking the fusion site of the fusion gene mutation as a center; the prediction step of the affinity of the mutant peptide segment MHC I type and MHC II type comprises the steps of taking the HLA molecule type of a tumor sample obtained in the MHC molecule identification step, the mutant prediction peptide segment obtained in the mutant peptide segment prediction step and the wild type peptide segment sequence corresponding to the mutant prediction peptide segment as the input of MHC I type and MHC II type affinity prediction software, respectively predicting the affinity level of the mutant peptide segment and MHC I type and MHC II type genes, and taking the predicted affinity level smaller than 500nM as a candidate tumor neogenesis antigen; detecting antigen expression abundance, including detecting antigen expression abundance of each mutation prediction peptide segment in candidate tumor neogenesis antigen by adopting antigen expression abundance calculation software; the clonality analysis step comprises detecting the clonality of each mutation prediction peptide segment in the candidate tumor neogenetic antigen by adopting mutation clonality analysis software, wherein the clonality is characterized by the proportion of the mutation cells in the tumor tissue to be detected; and comprehensively scoring and sequencing the candidate tumor neogenesis antigens, wherein the step comprises scoring each mutation prediction peptide segment in the candidate tumor neogenesis antigens according to a formula, and selecting the candidate tumor neogenesis antigens with high scores as the tumor neogenesis antigens according to the ranking from high to low scores.
Another embodiment of the present application also provides a computer-readable storage medium containing a program executable by a processor to implement a method of: detecting mutation, namely detecting tumor somatic point mutation and insertion deletion mutation of a comparison file of sequencing results of a tumor sample and a normal sample by adopting at least two mutation detection software, and taking an intersection detected by the two mutation detection software as a candidate mutation; meanwhile, carrying out fusion gene mutation detection on a comparison file of a tumor transcriptome sequencing result, and taking the detected fusion gene mutation as a candidate mutation; MHC molecule identification step, including adopting HLA molecule type detection software polysolver and BWA mem to detect HLA molecule types of normal sample and tumor sample, if HLA molecule of tumor sample detected by polysolver matches with normal sample, outputting as result; if not, checking the matching condition of the HLA molecules of the tumor sample detected by the BWA mem and the normal sample, if so, outputting the detection result of the BWA mem, and if not, outputting an empty result; a mutation annotation step comprising performing annotation of genomic mutations to amino acid mutations for point mutations and indel mutations among the candidate mutations; a step of predicting mutant peptide fragments, which comprises predicting peptide fragments of point mutation, insertion deletion mutation and fusion gene mutation in candidate mutation; specifically, the method comprises the steps of taking point mutation mutant amino acid as a center, and extending the length of at least 10 amino acids back and forth to be used as a point mutation prediction peptide segment; the mutation prediction peptide segment of the insertion deletion mutation is used as a mutation prediction peptide segment by taking the mutation position of the insertion deletion mutation as a center, extending forwards for at least 10 amino acids and extending backwards until reaching the position of normal amino acid translation; intercepting at least 10 amino acids at the 3 'end and the 5' end of the fusion gene as a mutation prediction peptide segment of the fusion gene mutation by taking the fusion site of the fusion gene mutation as a center; the prediction step of the affinity of the mutant peptide segment MHC I type and MHC II type comprises the steps of taking the HLA molecule type of a tumor sample obtained in the MHC molecule identification step, the mutant prediction peptide segment obtained in the mutant peptide segment prediction step and the wild type peptide segment sequence corresponding to the mutant prediction peptide segment as the input of MHC I type and MHC II type affinity prediction software, respectively predicting the affinity level of the mutant peptide segment and MHC I type and MHC II type genes, and taking the predicted affinity level smaller than 500nM as a candidate tumor neogenesis antigen; detecting antigen expression abundance, including detecting antigen expression abundance of each mutation prediction peptide segment in candidate tumor neogenesis antigen by adopting antigen expression abundance calculation software; the clonality analysis step comprises detecting the clonality of each mutation prediction peptide segment in the candidate tumor neogenetic antigen by adopting mutation clonality analysis software, wherein the clonality is characterized by the proportion of the mutation cells in the tumor tissue to be detected; and comprehensively scoring and sequencing the candidate tumor neogenesis antigens, wherein the step comprises scoring each mutation prediction peptide segment in the candidate tumor neogenesis antigens according to a formula, and selecting the candidate tumor neogenesis antigens with high scores as the tumor neogenesis antigens according to the ranking from high to low scores.
The present application is described in further detail below with reference to specific embodiments and the attached drawings. The following examples are intended to be illustrative of the present application only and should not be construed as limiting the present application.
Example 1
This example uses the data already published in Yadav, Mahesh, et al, "Predicting immunological tissue statistics by combining mass spectrometry and exterior sequencing," Nature 515.7528(2014):572. document (hereinafter referred to as document 1): exon data of tumor samples and normal samples of the mouse model MC-38, and transcriptome data; the method for detecting the tumor neoantigen based on the second-generation sequencing is used for detecting the tumor neoantigen, and comprises the following steps:
(1) variation detection
The bam file obtained by aligning the DNA sequencing of tumor and normal samples was used to detect tumor somatic point mutations (SNV) and insertion and deletion (InDel) using VarScan and mutect software. To obtain high quality mutations, the intersection of the two pieces of software was used as a high quality candidate mutation. For the detection of the Fusion gene, STAR-Fusion was applied to detect the aligned RNA bam format files.
(2) MHC molecule identification
In order to examine the types of MHC-I and MHC-II molecules, the HLA molecule types of the normal sample and the tumor sample were examined using polysolver in this example. If the HLA molecules detected in the tumour by the polysolver match the normal sample, the result is output, if there is no match the result in BWAmem is detected, if the result of BWAmem finds that the normal and tumour samples match the result of BWA mem is used, if there is no match the result is output.
(3) Variant notes
For point mutations and indels, annotation of genomic mutations to amino acid mutations was done using vep (variable Effect prediction) tools.
(4) Prediction of mutant peptide fragments
For point mutations and indels, prediction of the genomic mutant peptide fragments was done using the transvar tool. The point mutation is centered on the mutated amino acid and extends 10(MHC II 14) amino acids back and forth to form the final mutated peptide. The insertion deletion mutation is extended forward by 10(MHC II 14) amino acids in length and backward to a position where normal amino acid translation is achieved, centering on the mutation position.
The peptide segment of the fusion gene is obtained by taking the fusion site as the center and intercepting 10(MHC II 14) amino acids at the 3 'end and the 5' end of the fusion gene as a final mutant peptide segment.
(5) Prediction of mutant peptide MHC class I/II affinity
And (3) taking the HLA molecular typing of the patient obtained in the step (2), the mutant peptide segment sequence obtained in the step (4) and the corresponding wild peptide segment sequence as the input of netMHCpan and netMHCIIpan software, and respectively predicting the affinity level with MHC class I and MHC class II genes. Predicted results with an affinity level of less than 500nM as a potential tumor neoantigen result.
(6) Detection of abundance of novel antigen expression
The TPM value of the mutant peptide fragment was calculated by RESM software as the expression abundance of the novel antigen.
(7) Novel antigen cloning assay
The clonality of the mutation in which the antigen is located was calculated using PyClone and was measured as the proportion of tumor cells occupied by the mutation.
(8) Comprehensive ranking of neoantigens
Overall, the scoring formula of the neoantigen peptide fragment is shown in the specification
The formula I is as follows: score (m) ═ epistoontent (m) × expresslevel (m) × clonallevel (m)
In formula one, score (m) is the total score of the mutation prediction peptide segment m, and episteContent (m) represents the sum of the scores of all antigen peptide segments p with MHC affinity corresponding to the neoantigen m; expression level (m) represents the antigen expression abundance of the neoantigen m; ClonalLevel (m) indicates the clonality of the neoantigen m.
Wherein EpitopeContent (m) of formula one is calculated by formula two,
the formula II is as follows:
in the formula II, EpitopeScore (P [ i: i + k ] represents the sum of the affinities of each mutation prediction peptide segment, antigen peptide segments P which are extended by k amino acids in front and back directions by taking the mutation amino acids as the center and each MHC, i represents the serial numbers of all antigen peptides which cross the mutation under the specific antigen peptides which are extended by k lengths in front and back directions, wherein the serial numbers start from 0, | P | represents the length of the peptide segments which are extended by k amino acids in front and back directions by taking the mutation amino acids as the center, and | P | -k represents the upper limit of the serial numbers of all antigen peptides which cross the mutation under the specific antigen peptides which are extended by k lengths in front and back directions, namely the sum of all the antigen peptides which cross the mutation;
EpitoeCore (p [ i: i + k ] is calculated by equation three,
the formula III is as follows: epitoscore (e) ═ Σa∈HLAσ(BindingAffinity(e,a))×SelfFilter(e,a)
In formula III, Epitoescore (e) is Epitoescore (p [ i: i + k)]Value, sigmaa∈HLAσ (BindingAffinity (e, a)) × SelfFilter (e, a)) represents the sum of affinities of each core peptide fragment e and all MHC subtypes a, σ (BindingAffinity (e, a)) is calculated by equation four, and SelfFilter (e, a) refers to the homology of the antigen peptide fragments;
the formula four is as follows:
in the formula IV, sigma(s), i.e. sigma (binding affinity (e, a)), e is a natural base number, and s is the affinity value of core peptide fragment e and MHC of subtype a given by the affinity prediction software.
SelfFilter (e, a) can be obtained by the following formula:
the SelfFilter (e, a) calculation formula is illustrated below: antigen peptide e, in the case of homologous peptide fragments of MHC subtype a, if similar peptide fragments are found on the normal human genome, the SelFilter (e, a) value is 0, otherwise 1.
The expression level (m) of equation one is obtained from the following equation,
the expression level (m) formula is illustrated as follows: if the mutation predicts the antigen expression level of the peptide fragment m to be less than 10-3If so, then expression level (m) is 0; if the mutation predicts the antigen expression level of peptide fragment m not less than 10-3Then, expression level (m) is taken as the antigen expression abundance value output by the antigen expression abundance calculation software.
ClonalLevel (m) of formula one is obtained by calculating formula five,
the formula five is as follows: ClonalLevel (m) ═ p (Clonal) x (1-p (sublonal))
In the formula five, p (Clonal) is the probability of the new antigen clone output by the mutation clonality analysis software, and p (subclonal) is the probability of the subclone output by the mutation clonality analysis software.
The second generation sequencing data of mouse model MC-38 published in reference 1 were analyzed according to the above method, and finally 64 tumor neoantigens, including 3 tumor neoantigens successfully verified by mass spectrometry in reference 1, were screened from 1290 mutations in the transcriptome region disclosed in reference 1. In reference 1, 1290 mutations in the transcriptome region are found in the exon region, 170 new antigens are predicted, and 3 new antigens are successfully verified by using a mass spectrometry technology. 63.5% of the results were excluded from the original false positive prediction.
Example 2
The method for detecting neoantigens of tumors according to example 1 was performed using published data ICC24(Sia D, Losic B, Moeini A, et al. massive parallel sequencing inverters operable FGFR2-PPHLN1fusion and ARAF mutations in intrahepatic cholestenocardia antigens) [ J ]. Nature Communications,2015,6: 6087-. The results showed that 5 antigen peptides recognized by HLA were detected by the method of example 1, including the fusion gene with high frequency in ICC recognized by HLA-01, and the fusion gene FGFR2-PPHLN1 derived from intrahepatic bile duct carcinoma. As can be seen, the method for detecting neogenetic tumor antigen of example 1 was used to find a new neogenetic tumor antigen in cholangiocellular carcinoma. The late-stage cholangiocellular carcinoma has no good treatment means and low survival rate; the method of example 1 detects and obtains the neoantigen, finds a novel treatment mode of the cholangiocellular carcinoma, and provides a new scheme and approach for treating the cholangiocellular carcinoma.
Example 3
The method is used for carrying out new antigen detection on 288 intrahepatic bile duct cancer (ICC) samples, and the 288 intrahepatic bile duct cancer samples are derived from the following 4 documents:
Hiromi Nakamura,Yasuhito Arai1,Yasushi Totoki,et al.Genomic spectra of biliary tract cancer.[J].Nature Genetics,2015,47(9):1003.
Shanshan Zou,Jiarui Li,Huabang Zhou,et al.Mutational landscape of intrahepatic cholangiocarcinoma.[J].Nature Communications,2014,5:5696.
Yuchen Jiao,Timothy M Pawlik,Robert A Anders,et al.Exome sequencing identifies frequent inactivating mutations in BAP1,ARID1A and PBRM1 in intrahepatic cholangiocarcinomas.[J].Nature Genetics,2013,45(12):1470-U93.
Sia D,Losic B,Moeini A,et al.Massive parallel sequencing uncovers actionable FGFR2–PPHLN1 fusion and ARAF mutations in intrahepatic cholangiocarcinoma.[J].Nature Communications,2015,6:6087-6087.
the results of the analysis of 18813 non-synonymous mutations from 288 ICC samples showed that on average 22.8 mutant antigen peptides recognized by high-frequency HLA genotypes in the human population were found per ICC sample, 62% of which were clonal mutation. Indicating that the samples can be used for treating patients by applying a precise cell immunotherapy method when proper targeted drugs do not exist.
The foregoing is a more detailed description of the present application in connection with specific embodiments thereof, and it is not intended that the present application be limited to the specific embodiments thereof. For those skilled in the art to which the present application pertains, several simple deductions or substitutions may be made without departing from the concept of the present application, and all should be considered as belonging to the protection scope of the present application.